# Working With Pandas DataFrames in Python

# Questions

How can I import data in Python?

What is Pandas?

Why should I use Pandas to work with data?

# Objectives

Navigate the workshop directory and download a dataset.

Explain what a library is and what libraries are used for.

Describe what the Python Data Analysis Library (Pandas) is.

Load the Python Data Analysis Library (Pandas).

Use read_csv to read tabular data into Python.

Describe what a DataFrame is in Python.

Access and summarize data stored in a DataFrame.


### Data Set Information:

The two datasets are related to red and white variants of the Portuguese "Vinho Verde" wine.  Due to privacy and logistic issues, only physicochemical (inputs) and sensory (the output) variables are available (e.g. there is no data about grape types, wine brand, wine selling price, etc.).

These datasets can be viewed as classification or regression tasks. The classes are ordered and not balanced (e.g. there are many more normal wines than excellent or poor ones). Outlier detection algorithms could be used to detect the few excellent or poor wines. Also, we are not sure if all input variables are relevant. So it could be interesting to test feature selection methods.

Attribute Information:
Input variables (based on physicochemical tests): 

1 - fixed acidity 

2 - volatile acidity 

3 - citric acid 

4 - residual sugar 

5 - chlorides 

6 - free sulfur dioxide 

7 - total sulfur dioxide 

8 - density 

9 - pH 

10 - sulphates 

11 - alcohol 

Output variable (based on sensory data): 
12 - quality (score between 0 and 10)

Explore `winequality-red.csv` and `winequality-white.csv` 

# Pandas in Python

One of the best options for working with tabular data in Python is to use the Python Data Analysis Library 
(a.k.a. Pandas). The Pandas library provides data structures, produces high quality plots with matplotlib and 
integrates nicely with other libraries that use NumPy (which is another Python library) arrays.

Python doesn’t load all of the libraries available to it by default. We have to add an import statement 
to our code in order to use library functions. To import a library, we use the syntax import libraryName. 
If we want to give the library a nickname to shorten the command, we can add as nickNameHere. 
An example of importing the pandas library using the common nickname pd is below.

In [1]:
import pandas as pd

Each time we call a function that’s in a library, we use the syntax LibraryName.FunctionName. 
Adding the library name with a . before the function name tells Python where to find the function. 
In the example above, we have imported Pandas as pd. 
This means we don’t have to type out pandas each time we call a Pandas function.

### Series in Pandas
A one-dimensional labeled array capable of holding any data type. But one type of data at once

In [2]:
s1 = pd.Series([10,20,30,40])
print(s1)
s1.index

0    10
1    20
2    30
3    40
dtype: int64


RangeIndex(start=0, stop=4, step=1)

In [3]:
s2 = pd.Series([10,20,30,40],index=["A","B","C","D"])
print(s2)
s1.index

A    10
B    20
C    30
D    40
dtype: int64


RangeIndex(start=0, stop=4, step=1)

### Subsetting a series

In [4]:
# Subsetting by index
s2[2]

30

In [5]:
# Subsetting by label
s2['B']

20

In [6]:
# Subsetting by range
s2[1:3]

B    20
C    30
dtype: int64

In [7]:
# Subsetting by multiple labels
s2[['A','C']]

A    10
C    30
dtype: int64

In [8]:
# Subsetting by logic
s2[s2>25]

C    30
D    40
dtype: int64

### Dataframe is a collection of Series

# Reading CSV Data Using Pandas

We will begin by locating and reading our survey data which are in CSV format. 
CSV stands for Comma-Separated Values and is a common way store formatted data. 
Other symbols may also be used, so you might see tab-separated, colon-separated or space separated files. 
It is quite easy to replace one separator with another, to match your application. 
The first line in the file often has headers to explain what is in each column. 
CSV (and other separators) make it easy to share data, and can be imported and exported from many applications, 
including Microsoft Excel. For more details on CSV files, see the Data Organisation in Spreadsheets lesson. 
We can use Pandas’ read_csv function to pull the file directly into a DataFrame.

# So What’s a DataFrame?

A DataFrame is a 2-dimensional data structure that can store data of different types (including characters, integers, 
floating point values, factors and more) in columns. It is similar to a spreadsheet or an SQL table or the data.frame 
in R. A DataFrame always has an index (0-based). An index refers to the position of an element in the data structure.

In [20]:
import pandas as pd
# Note that pd.read_csv is used because we imported pandas as pd
red_df = pd.read_csv(r'C:\Users\itspark\Documents\Analytics\dataset/red_wine.csv')
white_df = pd.read_csv(r'C:\Users\itspark\Documents\Analytics\dataset/white_wine.csv')

In [21]:
red_df

Unnamed: 0,fixed acidity,volatile acidity,citric acid,residual sugar,chlorides,free sulfur dioxide,total sulfur dioxide,density,pH,sulphates,alcohol,quality
0,7.4,0.700,0.00,1.9,0.076,11.0,34.0,0.99780,3.51,0.56,9.4,5
1,7.8,0.880,0.00,2.6,0.098,25.0,67.0,0.99680,3.20,0.68,9.8,5
2,7.8,0.760,0.04,2.3,0.092,15.0,54.0,0.99700,3.26,0.65,9.8,5
3,11.2,0.280,0.56,1.9,0.075,17.0,60.0,0.99800,3.16,0.58,9.8,6
4,7.4,0.700,0.00,1.9,0.076,11.0,34.0,0.99780,3.51,0.56,9.4,5
...,...,...,...,...,...,...,...,...,...,...,...,...
1594,6.2,0.600,0.08,2.0,0.090,32.0,44.0,0.99490,3.45,0.58,10.5,5
1595,5.9,0.550,0.10,2.2,0.062,39.0,51.0,0.99512,3.52,0.76,11.2,6
1596,6.3,0.510,0.13,2.3,0.076,29.0,40.0,0.99574,3.42,0.75,11.0,6
1597,5.9,0.645,0.12,2.0,0.075,32.0,44.0,0.99547,3.57,0.71,10.2,5


In [22]:
white_df.head()

Unnamed: 0,fixed acidity,volatile acidity,citric acid,residual sugar,chlorides,free sulfur dioxide,total sulfur dioxide,density,pH,sulphates,alcohol,quality
0,7.0,0.27,0.36,20.7,0.045,45.0,170.0,1.001,3.0,0.45,8.8,6
1,6.3,0.3,0.34,1.6,0.049,14.0,132.0,0.994,3.3,0.49,9.5,6
2,8.1,0.28,0.4,6.9,0.05,30.0,97.0,0.9951,3.26,0.44,10.1,6
3,7.2,0.23,0.32,8.5,0.058,47.0,186.0,0.9956,3.19,0.4,9.9,6
4,7.2,0.23,0.32,8.5,0.058,47.0,186.0,0.9956,3.19,0.4,9.9,6


In [2]:
# Check output - Top 30 rows gets displayed
red_df.head()
#red_df[:31]
#red_df[0:31]

Unnamed: 0,Unnamed: 1,Unnamed: 2,Unnamed: 3,Unnamed: 4,Unnamed: 5,Unnamed: 6,Unnamed: 7,Unnamed: 8,Unnamed: 9,Unnamed: 10,"fixed acidity;""volatile acidity"";""citric acid"";""residual sugar"";""chlorides"";""free sulfur dioxide"";""total sulfur dioxide"";""density"";""pH"";""sulphates"";""alcohol"";""quality"""
7.4,0.7,0.0,1.9,0.076,11.0,34.0,0.9978,3.51,0.56,9.4,5
7.8,0.88,0.0,2.6,0.098,25.0,67.0,0.9968,3.2,0.68,9.8,5
7.8,0.76,0.04,2.3,0.092,15.0,54.0,0.997,3.26,0.65,9.8,5
11.2,0.28,0.56,1.9,0.075,17.0,60.0,0.998,3.16,0.58,9.8,6
7.4,0.7,0.0,1.9,0.076,11.0,34.0,0.9978,3.51,0.56,9.4,5


In [5]:
# To check out number of rows/index and columns
red_df.shape

(1599, 12)

In [6]:
# .head() method displays the first several lines of a file.
red_df.head()

Unnamed: 0,fixed acidity,volatile acidity,citric acid,residual sugar,chlorides,free sulfur dioxide,total sulfur dioxide,density,pH,sulphates,alcohol,quality
0,7.4,0.7,0.0,1.9,0.076,11.0,34.0,0.9978,3.51,0.56,9.4,5
1,7.8,0.88,0.0,2.6,0.098,25.0,67.0,0.9968,3.2,0.68,9.8,5
2,7.8,0.76,0.04,2.3,0.092,15.0,54.0,0.997,3.26,0.65,9.8,5
3,11.2,0.28,0.56,1.9,0.075,17.0,60.0,0.998,3.16,0.58,9.8,6
4,7.4,0.7,0.0,1.9,0.076,11.0,34.0,0.9978,3.51,0.56,9.4,5


In [8]:
# We can use the type() function to see what kind of thing red_df is:
type(red_df)
#  it’s a DataFrame (or, to use the full name that Python uses to refer to it internally, 
# a pandas.core.frame.DataFrame).

pandas.core.frame.DataFrame

What kind of things does red_df contain? DataFrames have an attribute called .dtypes that answers this:

# Types of Data

How information is stored in a DataFrame or a Python object affects what we can do with it and the outputs of calculations as well. There are two main types of data that we will explore in this lesson: numeric and text data types.

# Numeric Data Types

Numeric data types include integers and floats. A floating point (known as a float) number has decimal points even if that decimal point value is 0. For example: 1.13, 2.0, 1234.345. If we have a column that contains both integers and floating point numbers, Pandas will assign the entire column to the float data type so the decimal points are not lost.

An integer will never have a decimal point. Thus if we wanted to store 1.13 as an integer it would be stored as 1. Similarly, 1234.345 would be stored as 1234. You will often see the data type Int64 in Python which stands for 64 bit integer. The 64 simply refers to the memory allocated to store data in each cell which effectively relates to how many digits it can store in each “cell”. Allocating space ahead of time allows computers to optimize storage and processing efficiency.

# Text Data Type

Text data type is known as Strings in Python, or Objects in Pandas. Strings can contain numbers and / or characters. For example, a string might be a word, a sentence, or several sentences. A Pandas object might also be a plot name like ‘plot1’. A string can also contain or consist of numbers. For instance, ‘1234’ could be stored as a string, as could ‘10.23’. However strings that contain numbers can not be used for mathematical operations!

Pandas and base Python use slightly different names for data types. 

Pandas Type	= object | int64 | float64 | datetime64

Native Python Type = string | int | float | N/A

*64 refers to the memory allocated to hold this character

In [9]:
red_df.dtypes

fixed acidity           float64
volatile acidity        float64
citric acid             float64
residual sugar          float64
chlorides               float64
free sulfur dioxide     float64
total sulfur dioxide    float64
density                 float64
pH                      float64
sulphates               float64
alcohol                 float64
quality                   int64
dtype: object

In [10]:
white_df.dtypes

fixed acidity           float64
volatile acidity        float64
citric acid             float64
residual sugar          float64
chlorides               float64
free sulfur dioxide     float64
total sulfur dioxide    float64
density                 float64
pH                      float64
sulphates               float64
alcohol                 float64
quality                   int64
dtype: object

There are many ways to summarize and access the data stored in DataFrames, using attributes and methods provided by the DataFrame object.

To access an attribute, use the DataFrame object name followed by the attribute name df_object.attribute. Using the DataFrame red_df and attribute columns, an index of all the column names in the DataFrame can be accessed with surveys_df.columns.

# Challenge 1 - DataFrames

Using our DataFrame red_df, try out the attributes & methods below to see what they return.

a. red_df.index

b. red_df.columns

c. red_df.shape Take note of the output of shape - what format does it return the shape of the DataFrame in?

d. red_df.head() Also, what does red_df.head(15) do?

e. red_df.tail()

In [12]:
red_df.index

RangeIndex(start=0, stop=1599, step=1)

In [13]:
red_df.columns

Index(['fixed acidity', 'volatile acidity', 'citric acid', 'residual sugar',
       'chlorides', 'free sulfur dioxide', 'total sulfur dioxide', 'density',
       'pH', 'sulphates', 'alcohol', 'quality'],
      dtype='object')

In [21]:
red_df.shape

(1599, 12)

In [22]:
# understand type of object for .shape output
type(red_df.shape)

tuple

In [23]:
red_df.head()

Unnamed: 0,fixed acidity,volatile acidity,citric acid,residual sugar,chlorides,free sulfur dioxide,total sulfur dioxide,density,pH,sulphates,alcohol,quality
0,7.4,0.7,0.0,1.9,0.076,11.0,34.0,0.9978,3.51,0.56,9.4,5
1,7.8,0.88,0.0,2.6,0.098,25.0,67.0,0.9968,3.2,0.68,9.8,5
2,7.8,0.76,0.04,2.3,0.092,15.0,54.0,0.997,3.26,0.65,9.8,5
3,11.2,0.28,0.56,1.9,0.075,17.0,60.0,0.998,3.16,0.58,9.8,6
4,7.4,0.7,0.0,1.9,0.076,11.0,34.0,0.9978,3.51,0.56,9.4,5


In [24]:
red_df.head(15)

Unnamed: 0,fixed acidity,volatile acidity,citric acid,residual sugar,chlorides,free sulfur dioxide,total sulfur dioxide,density,pH,sulphates,alcohol,quality
0,7.4,0.7,0.0,1.9,0.076,11.0,34.0,0.9978,3.51,0.56,9.4,5
1,7.8,0.88,0.0,2.6,0.098,25.0,67.0,0.9968,3.2,0.68,9.8,5
2,7.8,0.76,0.04,2.3,0.092,15.0,54.0,0.997,3.26,0.65,9.8,5
3,11.2,0.28,0.56,1.9,0.075,17.0,60.0,0.998,3.16,0.58,9.8,6
4,7.4,0.7,0.0,1.9,0.076,11.0,34.0,0.9978,3.51,0.56,9.4,5
5,7.4,0.66,0.0,1.8,0.075,13.0,40.0,0.9978,3.51,0.56,9.4,5
6,7.9,0.6,0.06,1.6,0.069,15.0,59.0,0.9964,3.3,0.46,9.4,5
7,7.3,0.65,0.0,1.2,0.065,15.0,21.0,0.9946,3.39,0.47,10.0,7
8,7.8,0.58,0.02,2.0,0.073,9.0,18.0,0.9968,3.36,0.57,9.5,7
9,7.5,0.5,0.36,6.1,0.071,17.0,102.0,0.9978,3.35,0.8,10.5,5


In [25]:
red_df.tail()

Unnamed: 0,fixed acidity,volatile acidity,citric acid,residual sugar,chlorides,free sulfur dioxide,total sulfur dioxide,density,pH,sulphates,alcohol,quality
1594,6.2,0.6,0.08,2.0,0.09,32.0,44.0,0.9949,3.45,0.58,10.5,5
1595,5.9,0.55,0.1,2.2,0.062,39.0,51.0,0.99512,3.52,0.76,11.2,6
1596,6.3,0.51,0.13,2.3,0.076,29.0,40.0,0.99574,3.42,0.75,11.0,6
1597,5.9,0.645,0.12,2.0,0.075,32.0,44.0,0.99547,3.57,0.71,10.2,5
1598,6.0,0.31,0.47,3.6,0.067,18.0,42.0,0.99549,3.39,0.66,11.0,6


In [26]:
red_df[30:50]

Unnamed: 0,fixed acidity,volatile acidity,citric acid,residual sugar,chlorides,free sulfur dioxide,total sulfur dioxide,density,pH,sulphates,alcohol,quality
30,6.7,0.675,0.07,2.4,0.089,17.0,82.0,0.9958,3.35,0.54,10.1,5
31,6.9,0.685,0.0,2.5,0.105,22.0,37.0,0.9966,3.46,0.57,10.6,6
32,8.3,0.655,0.12,2.3,0.083,15.0,113.0,0.9966,3.17,0.66,9.8,5
33,6.9,0.605,0.12,10.7,0.073,40.0,83.0,0.9993,3.45,0.52,9.4,6
34,5.2,0.32,0.25,1.8,0.103,13.0,50.0,0.9957,3.38,0.55,9.2,5
35,7.8,0.645,0.0,5.5,0.086,5.0,18.0,0.9986,3.4,0.55,9.6,6
36,7.8,0.6,0.14,2.4,0.086,3.0,15.0,0.9975,3.42,0.6,10.8,6
37,8.1,0.38,0.28,2.1,0.066,13.0,30.0,0.9968,3.23,0.73,9.7,7
38,5.7,1.13,0.09,1.5,0.172,7.0,19.0,0.994,3.5,0.48,9.8,4
39,7.3,0.45,0.36,5.9,0.074,12.0,87.0,0.9978,3.33,0.83,10.5,5


In [28]:
# Gives data types and information in Null/Missing values
red_df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 1599 entries, 0 to 1598
Data columns (total 12 columns):
fixed acidity           1599 non-null float64
volatile acidity        1599 non-null float64
citric acid             1599 non-null float64
residual sugar          1599 non-null float64
chlorides               1599 non-null float64
free sulfur dioxide     1599 non-null float64
total sulfur dioxide    1599 non-null float64
density                 1599 non-null float64
pH                      1599 non-null float64
sulphates               1599 non-null float64
alcohol                 1599 non-null float64
quality                 1599 non-null int64
dtypes: float64(11), int64(1)
memory usage: 150.0 KB


### Lets look at the quality Output Variable

In [29]:
red_df['quality']
#or
#red_df.quality

0       5
1       5
2       5
3       6
4       5
       ..
1594    5
1595    6
1596    6
1597    5
1598    6
Name: quality, Length: 1599, dtype: int64

There are too many rows / index and values are repeated

In [30]:
# Check unique elements in our series
pd.unique(red_df['quality'])
# or
red_df['quality'].unique()

array([5, 6, 7, 4, 8, 3], dtype=int64)

In [31]:
# Check number of unique elements in our series
red_df['quality'].nunique()

6

In [32]:
red_df['quality'].value_counts()

5    681
6    638
7    199
4     53
8     18
3     10
Name: quality, dtype: int64

In [33]:
# get summary statistics of continous(float) series
red_df['fixed acidity'].describe()

count    1599.000000
mean        8.319637
std         1.741096
min         4.600000
25%         7.100000
50%         7.900000
75%         9.200000
max        15.900000
Name: fixed acidity, dtype: float64

# Challenge 2 - Summarize dataframes

1. Summary statistics for all series in dataframe
2. count() of observations for each 'quality' type in red_df: Hint .groupby

In [29]:
red_df.describe()

Unnamed: 0,fixed acidity,volatile acidity,citric acid,residual sugar,chlorides,free sulfur dioxide,total sulfur dioxide,density,pH,sulphates,alcohol,quality
count,1599.0,1599.0,1599.0,1599.0,1599.0,1599.0,1599.0,1599.0,1599.0,1599.0,1599.0,1599.0
mean,8.319637,0.527821,0.270976,2.538806,0.087467,15.874922,46.467792,0.996747,3.311113,0.658149,10.422983,5.636023
std,1.741096,0.17906,0.194801,1.409928,0.047065,10.460157,32.895324,0.001887,0.154386,0.169507,1.065668,0.807569
min,4.6,0.12,0.0,0.9,0.012,1.0,6.0,0.99007,2.74,0.33,8.4,3.0
25%,7.1,0.39,0.09,1.9,0.07,7.0,22.0,0.9956,3.21,0.55,9.5,5.0
50%,7.9,0.52,0.26,2.2,0.079,14.0,38.0,0.99675,3.31,0.62,10.2,6.0
75%,9.2,0.64,0.42,2.6,0.09,21.0,62.0,0.997835,3.4,0.73,11.1,6.0
max,15.9,1.58,1.0,15.5,0.611,72.0,289.0,1.00369,4.01,2.0,14.9,8.0


In [30]:
# Gives total count of values in a variable
red_df['quality'].count()

1599

In [34]:
# Gives count for each unique vales of the varaiable
red_df.groupby('quality').count()

Unnamed: 0_level_0,fixed acidity,volatile acidity,citric acid,residual sugar,chlorides,free sulfur dioxide,total sulfur dioxide,density,pH,sulphates,alcohol
quality,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1
3,10,10,10,10,10,10,10,10,10,10,10
4,53,53,53,53,53,53,53,53,53,53,53
5,681,681,681,681,681,681,681,681,681,681,681
6,638,638,638,638,638,638,638,638,638,638,638
7,199,199,199,199,199,199,199,199,199,199,199
8,18,18,18,18,18,18,18,18,18,18,18


In [32]:
# count for only one of the series in a groupby
red_df.groupby('quality').count()['fixed acidity']

quality
3     10
4     53
5    681
6    638
7    199
8     18
Name: fixed acidity, dtype: int64

In [33]:
# summary statistics for only one of the series in a groupby
red_df.groupby('quality').describe()['fixed acidity']

Unnamed: 0_level_0,count,mean,std,min,25%,50%,75%,max
quality,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1
3,10.0,8.36,1.770875,6.7,7.15,7.5,9.875,11.6
4,53.0,7.779245,1.626624,4.6,6.8,7.5,8.4,12.5
5,681.0,8.167254,1.563988,5.0,7.1,7.8,8.9,15.9
6,638.0,8.347179,1.797849,4.7,7.0,7.9,9.4,14.3
7,199.0,8.872362,1.992483,4.9,7.4,8.8,10.1,15.6
8,18.0,8.566667,2.119656,5.0,7.25,8.25,10.225,12.6


# Missing Data Values - NaN

There are NaN (Not a Number) values. NaN values are undefined values that cannot be represented mathematically. Pandas, for example, will read an empty cell in a CSV or Excel sheet as a NaN. NaNs have some desirable properties

# Where Are the NaN’s?

In [34]:
# Which rows have 'quality' = NaN
red_df['quality'].isnull()

0       False
1       False
2       False
3       False
4       False
        ...  
1594    False
1595    False
1596    False
1597    False
1598    False
Name: quality, Length: 1599, dtype: bool

# Challenge 3 - Counting missing values (NaN)

Count the number of missing values per column. Hint: The method .sum() gives you the sum of observations per column (Remember: True = 1). Try looking to the .isnull() method.

In [35]:
# Number of rows have 'quality' = NaN
red_df['quality'].isnull().sum()

0

In [36]:
# Number of rows in all of dataframe where there is NaN
red_df.isnull().sum()

fixed acidity           0
volatile acidity        0
citric acid             0
residual sugar          0
chlorides               0
free sulfur dioxide     0
total sulfur dioxide    0
density                 0
pH                      0
sulphates               0
alcohol                 0
quality                 0
dtype: int64

## Aggregate by Columns

In [37]:
red_df['quality'].describe()

count    1599.000000
mean        5.636023
std         0.807569
min         3.000000
25%         5.000000
50%         6.000000
75%         6.000000
max         8.000000
Name: quality, dtype: float64

In [38]:
red_df['quality'].sum()

9012

### Convert Dictionary to Dataframe

In [39]:
data = {'ID': ['01', '02', '03', '04', '05'],
'Age': [11, 21, 31, 41, 51],
'Income':[5.6,4.5,7.8,10.2,12.1],
'Name': ['Alpha', 'Beta', 'Gamma', 'Delta', 'Epsilon'],
'Own House': [True,False,True,True,False]}
print(data)
print(type(data))


{'ID': ['01', '02', '03', '04', '05'], 'Age': [11, 21, 31, 41, 51], 'Income': [5.6, 4.5, 7.8, 10.2, 12.1], 'Name': ['Alpha', 'Beta', 'Gamma', 'Delta', 'Epsilon'], 'Own House': [True, False, True, True, False]}
<class 'dict'>


In [40]:
dataframe = pd.DataFrame(data)
print(dataframe)
print(type(dataframe))
dataframe.info()

   ID  Age  Income     Name  Own House
0  01   11     5.6    Alpha       True
1  02   21     4.5     Beta      False
2  03   31     7.8    Gamma       True
3  04   41    10.2    Delta       True
4  05   51    12.1  Epsilon      False
<class 'pandas.core.frame.DataFrame'>
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 5 entries, 0 to 4
Data columns (total 5 columns):
ID           5 non-null object
Age          5 non-null int64
Income       5 non-null float64
Name         5 non-null object
Own House    5 non-null bool
dtypes: bool(1), float64(1), int64(1), object(2)
memory usage: 293.0+ bytes
