# Introduction to Pandas

---------------------------------------------------------------------------------------------------------------------------
## Author: Srushti Shimpi
[LinkedIn Profile](https://www.linkedin.com/in/srushti-shimpi77/)

---------------------------------------------------------------------------------------------------------------------------
### Installation of Pandas

* If you are using **Conda**, you can install this library by typing following command in Anaconda prompt:
```
conda install pandas
```

* If you are using **PIP**, you can install this library by typing following command in Anaconda prompt:
```
pip3.7 install pandas
```

---------------------------------------------------------------------------------------------------------------------------
### Importing the libraries

In [247]:
import pandas as pd
import numpy as np

---------------------------------------------------------------------------------------------------------------------------
### Different types of Data Strucuters using Pandas

Pandas can be used for following data structures:
* **Series**: One dimentional arrays
* **DataFrame**: 2 dimentional arrays/ data in tabular form
---------------------------------------------------------------------------------------------------------------------------

* **Series**

Syntax:
```
pandas.Series( data, index, dtype, copy)
```

Data: Data can be array, list and constant values <br>
Index: If index passed, every row has unique index value <br>
dtype: Indicates data type <br>
copy: Copy data. Default False

In [248]:
#Example1
#creating a series with default index value
data1 = ['a', 'b', 'c', 'd', 'e']
s1 = pd.Series(data1)
s1

0    a
1    b
2    c
3    d
4    e
dtype: object

In [249]:
#Example2
#creating a series with predefined index values
Index = ['x1', 'x2', 'x3', 'x4', 'x5']
s2 = pd.Series(data1, Index)
s2

x1    a
x2    b
x3    c
x4    d
x5    e
dtype: object

---------------------------------------------------------------------------------------------------------------------------
* **DataFrame**

Syntax:
```
pandas.DataFrame( data, index, columns, dtype, copy)
```

Data: Data can be array, list and constant values <br>
Index: If index passed, every row has unique index value <br>
dtype: Indicates data type <br>
copy: Copy data. Default False <br>
Columns: Used to create column labels

In [250]:
#Creating a dataframe
data2 = [1,2,3,4,5]
df1 = pd.DataFrame(data2)
df1

Unnamed: 0,0
0,1
1,2
2,3
3,4
4,5


In [251]:
#Creating a dataframe by mentioning column name and datatype
data3 = [['Apple',10],['Orange',12],['Pear',13]]
df2 = pd.DataFrame(data3,columns=['Fruit','Count'], dtype = int)
df2

Unnamed: 0,Fruit,Count
0,Apple,10
1,Orange,12
2,Pear,13


In [252]:
#Creating a dataframe from dictionaries of ndarrays
data4 = {'Name':['Rick', 'Megan', 'John', 'Jill'],
         'Age':[28,25,29,27]}
df3 = pd.DataFrame(data4)
df3


Unnamed: 0,Name,Age
0,Rick,28
1,Megan,25
2,John,29
3,Jill,27


In [253]:
#Creating a dataframe from dictionaries of lists which also denotes the missing values "NaN"
data5 = [{'a':41, 'b': 4},
         {'a': 3, 'b': 8, 'c': 0}, 
         { 'b': 22, 'c': 16},
         {'a': 15, 'b': 4, 'c': 5}, ]
df4 = pd.DataFrame(data5, index=['first','second','third','fourth'])
df4

Unnamed: 0,a,b,c
first,41.0,4,
second,3.0,8,0.0
third,,22,16.0
fourth,15.0,4,5.0


In [254]:
#Creating a dataframe from dictionaries of series with missing value
data6 = {'one' : pd.Series([1, 2, 3], index=['a', 'b', 'c']),
         'two' : pd.Series([1, 2, 3, 4], index=['a', 'b', 'c', 'd'])}
df5 = pd.DataFrame(data6)
df5

Unnamed: 0,one,two
a,1.0,1
b,2.0,2
c,3.0,3
d,,4


In [255]:
# Adding new column to df5 dataframe
df5['three'] = pd.Series(['x', 'y', 'z', 'w'], index=['a', 'b', 'c', 'd'])
df5

Unnamed: 0,one,two,three
a,1.0,1,x
b,2.0,2,y
c,3.0,3,z
d,,4,w


In [256]:
# Adding new column to df5 dataframe
df5['four'] = df5['one']+ df5['two']
df5['five'] = 5
df5

Unnamed: 0,one,two,three,four,five
a,1.0,1,x,2.0,5
b,2.0,2,y,4.0,5
c,3.0,3,z,6.0,5
d,,4,w,,5


In [257]:
# deleting column from df5 dataframe
df5.drop(columns = ['five'])

#it can also be written as follow:
# df5.drop(columns = ['four', 'five']) where you can select one or more columns for deletion
# df5.drop(['four', 'five'], axis = 1) where you can select one or more columns for deletion

Unnamed: 0,one,two,three,four
a,1.0,1,x,2.0
b,2.0,2,y,4.0
c,3.0,3,z,6.0
d,,4,w,


In [258]:
# deleting specific rows from df5 dataframe by index
df5.drop(['a', 'c'])

Unnamed: 0,one,two,three,four,five
b,2.0,2,y,4.0,5
d,,4,w,,5


In [259]:
#Row Selection usinf loc() function
df5.loc['b']

one      2
two      2
three    y
four     4
five     5
Name: b, dtype: object

In [260]:
#Row Selection usinf iloc() function
df5.iloc[2]

one      3
two      3
three    z
four     6
five     5
Name: c, dtype: object

In [261]:
#Multiple rows can be selected using ‘ : ’ operator.
df5[2:4]

Unnamed: 0,one,two,three,four,five
c,3.0,3,z,6.0,5
d,,4,w,,5


In [262]:
data7= [['Apple',10],['Orange',12],['Pear',13]]
df6 = pd.DataFrame(data7,columns=['Fruit','Count'], dtype = int)
df6

Unnamed: 0,Fruit,Count
0,Apple,10
1,Orange,12
2,Pear,13


In [263]:
#Appending df6 data with df7 data
df7 = pd.DataFrame([['Strawberry', 5], ['Pineapple', 3], ['Grapes', 20]],columns=['Fruit','Count'], dtype = int)
df6 = df6.append(df7)
df6

Unnamed: 0,Fruit,Count
0,Apple,10
1,Orange,12
2,Pear,13
0,Strawberry,5
1,Pineapple,3
2,Grapes,20


### Basic functionalities of Pandas

In [264]:
#This shows version of pandas library
pd.__version__

'0.25.1'

In [265]:
#To display Pandas's built-in documentation
pd?

* **Reading the data set in different ways**

In [266]:
#Here we are reading .csv file
dfx = pd.read_csv("AB_NYC_2019.csv")

In [267]:
#To display first 5 rows by default
dfx.head()

#to display first specific number of rows
# dfx.head(15)

Unnamed: 0,id,name,host_id,host_name,neighbourhood_group,neighbourhood,latitude,longitude,room_type,price,minimum_nights,number_of_reviews,last_review,reviews_per_month,calculated_host_listings_count,availability_365
0,2539,Clean & quiet apt home by the park,2787,John,Brooklyn,Kensington,40.64749,-73.97237,Private room,149,1,9,2018-10-19,0.21,6,365
1,2595,Skylit Midtown Castle,2845,Jennifer,Manhattan,Midtown,40.75362,-73.98377,Entire home/apt,225,1,45,2019-05-21,0.38,2,355
2,3647,THE VILLAGE OF HARLEM....NEW YORK !,4632,Elisabeth,Manhattan,Harlem,40.80902,-73.9419,Private room,150,3,0,,,1,365
3,3831,Cozy Entire Floor of Brownstone,4869,LisaRoxanne,Brooklyn,Clinton Hill,40.68514,-73.95976,Entire home/apt,89,1,270,2019-07-05,4.64,1,194
4,5022,Entire Apt: Spacious Studio/Loft by central park,7192,Laura,Manhattan,East Harlem,40.79851,-73.94399,Entire home/apt,80,10,9,2018-11-19,0.1,1,0


In [268]:
#To display last 5 rows by default
dfx.tail()

#to display last specific number of rows
# dfx.tail(15)

Unnamed: 0,id,name,host_id,host_name,neighbourhood_group,neighbourhood,latitude,longitude,room_type,price,minimum_nights,number_of_reviews,last_review,reviews_per_month,calculated_host_listings_count,availability_365
48890,36484665,Charming one bedroom - newly renovated rowhouse,8232441,Sabrina,Brooklyn,Bedford-Stuyvesant,40.67853,-73.94995,Private room,70,2,0,,,2,9
48891,36485057,Affordable room in Bushwick/East Williamsburg,6570630,Marisol,Brooklyn,Bushwick,40.70184,-73.93317,Private room,40,4,0,,,2,36
48892,36485431,Sunny Studio at Historical Neighborhood,23492952,Ilgar & Aysel,Manhattan,Harlem,40.81475,-73.94867,Entire home/apt,115,10,0,,,1,27
48893,36485609,43rd St. Time Square-cozy single bed,30985759,Taz,Manhattan,Hell's Kitchen,40.75751,-73.99112,Shared room,55,1,0,,,6,2
48894,36487245,Trendy duplex in the very heart of Hell's Kitchen,68119814,Christophe,Manhattan,Hell's Kitchen,40.76404,-73.98933,Private room,90,7,0,,,1,23


In [269]:
#To see the list of the labels of the series.
dfx.axes

[RangeIndex(start=0, stop=48895, step=1),
 Index(['id', 'name', 'host_id', 'host_name', 'neighbourhood_group',
        'neighbourhood', 'latitude', 'longitude', 'room_type', 'price',
        'minimum_nights', 'number_of_reviews', 'last_review',
        'reviews_per_month', 'calculated_host_listings_count',
        'availability_365'],
       dtype='object')]

In [270]:
#To display the datatype of each column
dfx.dtypes

id                                  int64
name                               object
host_id                             int64
host_name                          object
neighbourhood_group                object
neighbourhood                      object
latitude                          float64
longitude                         float64
room_type                          object
price                               int64
minimum_nights                      int64
number_of_reviews                   int64
last_review                        object
reviews_per_month                 float64
calculated_host_listings_count      int64
availability_365                    int64
dtype: object

In [271]:
#Returns the Boolean value saying whether the Object is empty or not. True =  empty
dfx.empty

False

In [272]:
#To get the number of dimensions of the dataset
dfx.ndim

2

In [273]:
#To display total number of rows and columns
dfx.shape

(48895, 16)

In [274]:
#To display the number of elements in dataframe
dfx.size

782320

In [275]:
#to displaya actual data in the dataframe in arrray format
dfx.values

array([[2539, 'Clean & quiet apt home by the park', 2787, ..., 0.21, 6,
        365],
       [2595, 'Skylit Midtown Castle', 2845, ..., 0.38, 2, 355],
       [3647, 'THE VILLAGE OF HARLEM....NEW YORK !', 4632, ..., nan, 1,
        365],
       ...,
       [36485431, 'Sunny Studio at Historical Neighborhood', 23492952,
        ..., nan, 1, 27],
       [36485609, '43rd St. Time Square-cozy single bed', 30985759, ...,
        nan, 6, 2],
       [36487245, "Trendy duplex in the very heart of Hell's Kitchen",
        68119814, ..., nan, 1, 23]], dtype=object)

In [276]:
#Returns the sum of the values for the requested axis. 
dfx.minimum_nights.sum()

343730

In [277]:
#Returns the average value
dfx.minimum_nights.mean()

7.029962163820431

In [278]:
#Returns the Bressel standard deviation of the numerical columns.
dfx.minimum_nights.std()

20.51054953317987

In [279]:
#Display information of all columns
dfx.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 48895 entries, 0 to 48894
Data columns (total 16 columns):
id                                48895 non-null int64
name                              48879 non-null object
host_id                           48895 non-null int64
host_name                         48874 non-null object
neighbourhood_group               48895 non-null object
neighbourhood                     48895 non-null object
latitude                          48895 non-null float64
longitude                         48895 non-null float64
room_type                         48895 non-null object
price                             48895 non-null int64
minimum_nights                    48895 non-null int64
number_of_reviews                 48895 non-null int64
last_review                       38843 non-null object
reviews_per_month                 38843 non-null float64
calculated_host_listings_count    48895 non-null int64
availability_365                  48895 non-null int64

In [280]:
#It displays MAX  to MIN distinct values in the column or series
dfx.minimum_nights.value_counts(dropna=False)

1       12720
2       11696
3        7999
30       3760
4        3303
        ...  
42          1
186         1
265         1
1000        1
364         1
Name: minimum_nights, Length: 109, dtype: int64

In [281]:
#Summarizing Data
dfx.describe()

Unnamed: 0,id,host_id,latitude,longitude,price,minimum_nights,number_of_reviews,reviews_per_month,calculated_host_listings_count,availability_365
count,48895.0,48895.0,48895.0,48895.0,48895.0,48895.0,48895.0,38843.0,48895.0,48895.0
mean,19017140.0,67620010.0,40.728949,-73.95217,152.720687,7.029962,23.274466,1.373221,7.143982,112.781327
std,10983110.0,78610970.0,0.05453,0.046157,240.15417,20.51055,44.550582,1.680442,32.952519,131.622289
min,2539.0,2438.0,40.49979,-74.24442,0.0,1.0,0.0,0.01,1.0,0.0
25%,9471945.0,7822033.0,40.6901,-73.98307,69.0,1.0,1.0,0.19,1.0,0.0
50%,19677280.0,30793820.0,40.72307,-73.95568,106.0,3.0,5.0,0.72,1.0,45.0
75%,29152180.0,107434400.0,40.763115,-73.936275,175.0,5.0,24.0,2.02,2.0,227.0
max,36487240.0,274321300.0,40.91306,-73.71299,10000.0,1250.0,629.0,58.5,327.0,365.0


In [282]:
#Minimum Value
dfx.minimum_nights.min()

1

In [283]:
#Maximum value
dfx.minimum_nights.max()

1250

In [284]:
#Median of Values
dfx.minimum_nights.median()

3.0

In [285]:
#Mode of values
dfx.minimum_nights.mode()

0    1
dtype: int64

In [286]:
#Product of values
dfx.minimum_nights.prod()

0

In [287]:
#Absoulte value
dfx.minimum_nights.abs()

0         1
1         1
2         3
3         1
4        10
         ..
48890     2
48891     4
48892    10
48893     1
48894     7
Name: minimum_nights, Length: 48895, dtype: int64

In [288]:
#Cumulative sum
dfx.minimum_nights.cumsum()

0             1
1             2
2             5
3             6
4            16
          ...  
48890    343708
48891    343712
48892    343722
48893    343723
48894    343730
Name: minimum_nights, Length: 48895, dtype: int64

In [289]:
#Cumulative product
dfx.minimum_nights.cumprod()

0         1
1         1
2         3
3         3
4        30
         ..
48890     0
48891     0
48892     0
48893     0
48894     0
Name: minimum_nights, Length: 48895, dtype: int64

In [290]:
#Returns the correlation between columns in a data frame
dfx.corr()

Unnamed: 0,id,host_id,latitude,longitude,price,minimum_nights,number_of_reviews,reviews_per_month,calculated_host_listings_count,availability_365
id,1.0,0.58829,-0.003125,0.090908,0.010619,-0.013224,-0.31976,0.291828,0.133272,0.085468
host_id,0.58829,1.0,0.020224,0.127055,0.015309,-0.017364,-0.140106,0.296417,0.15495,0.203492
latitude,-0.003125,0.020224,1.0,0.084788,0.033939,0.024869,-0.015389,-0.010142,0.019517,-0.010983
longitude,0.090908,0.127055,0.084788,1.0,-0.150019,-0.062747,0.059094,0.145948,-0.114713,0.082731
price,0.010619,0.015309,0.033939,-0.150019,1.0,0.042799,-0.047954,-0.030608,0.057472,0.081829
minimum_nights,-0.013224,-0.017364,0.024869,-0.062747,0.042799,1.0,-0.080116,-0.121702,0.12796,0.144303
number_of_reviews,-0.31976,-0.140106,-0.015389,0.059094,-0.047954,-0.080116,1.0,0.549868,-0.072376,0.172028
reviews_per_month,0.291828,0.296417,-0.010142,0.145948,-0.030608,-0.121702,0.549868,1.0,-0.009421,0.185791
calculated_host_listings_count,0.133272,0.15495,0.019517,-0.114713,0.057472,0.12796,-0.072376,-0.009421,1.0,0.225701
availability_365,0.085468,0.203492,-0.010983,0.082731,0.081829,0.144303,0.172028,0.185791,0.225701,1.0


In [291]:
#Returns the covariance between columns in a data frame
dfx.cov()

Unnamed: 0,id,host_id,latitude,longitude,price,minimum_nights,number_of_reviews,reviews_per_month,calculated_host_listings_count,availability_365
id,120628700000000.0,507925500000000.0,-1871.769376,46085.553524,28008200.0,-2979067.0,-156460000.0,5244188.0,48233780.0,123553800.0
host_id,507925500000000.0,6179684000000000.0,86694.255843,461010.16637,289016200.0,-27997340.0,-490674100.0,37800870.0,401387500.0,2105527000.0
latitude,-1871.769,86694.26,0.002974,0.000213,0.4444481,0.02781484,-0.03738474,-0.0009371649,0.0350708,-0.07883238
longitude,46085.55,461010.2,0.000213,0.00213,-1.662923,-0.05940269,0.1215161,0.01145217,-0.1744759,0.5026104
price,28008200.0,289016200.0,0.444448,-1.662923,57674.03,210.8164,-513.0627,-10.13001,454.8128,2586.58
minimum_nights,-2979067.0,-27997340.0,0.027815,-0.059403,210.8164,420.6826,-73.20661,-3.555423,86.48462,389.5671
number_of_reviews,-156460000.0,-490674100.0,-0.037385,0.121516,-513.0627,-73.20661,1984.754,44.52519,-106.252,1008.744
reviews_per_month,5244188.0,37800870.0,-0.000937,0.011452,-10.13001,-3.555423,44.52519,2.823885,-0.4163055,40.44494
calculated_host_listings_count,48233780.0,401387500.0,0.035071,-0.174476,454.8128,86.48462,-106.252,-0.4163055,1085.868,978.9314
availability_365,123553800.0,2105527000.0,-0.078832,0.50261,2586.58,389.5671,1008.744,40.44494,978.9314,17324.43


In [292]:
#Returns the number of non-null values in each data frame
dfx.count()

id                                48895
name                              48879
host_id                           48895
host_name                         48874
neighbourhood_group               48895
neighbourhood                     48895
latitude                          48895
longitude                         48895
room_type                         48895
price                             48895
minimum_nights                    48895
number_of_reviews                 48895
last_review                       38843
reviews_per_month                 38843
calculated_host_listings_count    48895
availability_365                  48895
dtype: int64

### Filter, sort and Groupby

In [293]:
#This is diplaying to 5 values where values in column minimum_nights is greater that 200 (filtering using
# conditional statements)
dfx[dfx['minimum_nights'] > 200].head()

Unnamed: 0,id,name,host_id,host_name,neighbourhood_group,neighbourhood,latitude,longitude,room_type,price,minimum_nights,number_of_reviews,last_review,reviews_per_month,calculated_host_listings_count,availability_365
700,258690,CHELSEA 1 Bdrm Plus Sleeping Loft!!,1359611,Andrea,Manhattan,Chelsea,40.74618,-74.00392,Entire home/apt,195,365,10,2014-10-26,0.12,1,0
754,271694,"Easy, comfortable studio in Midtown",1387370,James,Manhattan,Midtown,40.75282,-73.97315,Entire home/apt,125,365,19,2015-09-08,0.21,1,365
970,387324,Cozy Room in Sunny Apartment (Long/Short Term),1828506,Yogi,Manhattan,Kips Bay,40.74238,-73.98122,Private room,74,240,15,2018-09-04,0.17,1,90
1305,568684,800sqft apartment with huge terrace,2798644,Alessandra,Brooklyn,Bushwick,40.70202,-73.92402,Entire home/apt,115,370,6,2018-04-15,0.09,1,365
1449,649561,Manhattan Sky Crib (1 year sublet),3260084,David,Manhattan,Chelsea,40.75164,-73.99425,Entire home/apt,135,365,0,,,1,365


In [294]:
# sorting: sort_values sorts value in ascending order by default
dfx.sort_values('minimum_nights').head(3)

Unnamed: 0,id,name,host_id,host_name,neighbourhood_group,neighbourhood,latitude,longitude,room_type,price,minimum_nights,number_of_reviews,last_review,reviews_per_month,calculated_host_listings_count,availability_365
0,2539,Clean & quiet apt home by the park,2787,John,Brooklyn,Kensington,40.64749,-73.97237,Private room,149,1,9,2018-10-19,0.21,6,365
16078,12990578,Cozy Room with Window View near Times Square,49447536,Wing Yan,Manhattan,Hell's Kitchen,40.75484,-73.9922,Private room,100,1,1,2016-06-08,0.03,1,0
16090,13001082,Share space in E Harlem,4112409,Rick,Manhattan,East Harlem,40.79193,-73.9439,Shared room,45,1,52,2019-06-19,1.41,1,347


In [295]:
# To sort column in descending order
dfx.sort_values('minimum_nights',ascending=False).head(3)

Unnamed: 0,id,name,host_id,host_name,neighbourhood_group,neighbourhood,latitude,longitude,room_type,price,minimum_nights,number_of_reviews,last_review,reviews_per_month,calculated_host_listings_count,availability_365
5767,4204302,Prime W. Village location 1 bdrm,17550546,Genevieve,Manhattan,Greenwich Village,40.73293,-73.99782,Entire home/apt,180,1250,2,2014-11-09,0.03,1,365
2854,1615764,,6676776,Peter,Manhattan,Battery Park City,40.71239,-74.0162,Entire home/apt,400,1000,0,,,1,362
38664,30378211,Shared Studio (females only),200401254,Meg,Manhattan,Greenwich Village,40.73094,-73.999,Shared room,110,999,0,,,1,365


In [296]:
#It displays the value in order where minimum_nights is less where price is more.
dfx.sort_values(['minimum_nights','price'],ascending=[True,False]).head(3)

Unnamed: 0,id,name,host_id,host_name,neighbourhood_group,neighbourhood,latitude,longitude,room_type,price,minimum_nights,number_of_reviews,last_review,reviews_per_month,calculated_host_listings_count,availability_365
4377,2953058,Film Location,1177497,Jessica,Brooklyn,Clinton Hill,40.69137,-73.96723,Entire home/apt,8000,1,1,2016-09-15,0.03,11,365
29662,22779726,East 72nd Townhouse by (Hidden by Airbnb),156158778,Sally,Manhattan,Upper East Side,40.76824,-73.95989,Entire home/apt,7703,1,0,,,12,146
42523,33007610,70' Luxury MotorYacht on the Hudson,7407743,Jack,Manhattan,Battery Park City,40.71162,-74.01693,Entire home/apt,7500,1,0,,,1,364


In [297]:
#Displays the different types of categories in given column. Ypu can filter data using groupby function on one or more col 
dfx.groupby(['room_type']).groups

{'Entire home/apt': Int64Index([    1,     3,     4,     5,     9,    10,    14,    15,    16,
                18,
             ...
             48861, 48866, 48870, 48872, 48873, 48879, 48880, 48886, 48887,
             48892],
            dtype='int64', length=25409),
 'Private room': Int64Index([    0,     2,     6,     7,     8,    11,    12,    13,    17,
                21,
             ...
             48881, 48882, 48883, 48884, 48885, 48888, 48889, 48890, 48891,
             48894],
            dtype='int64', length=22326),
 'Shared room': Int64Index([   39,   203,   357,   492,   545,   944,   975,  1103,  1175,
              1299,
             ...
             48713, 48716, 48720, 48729, 48785, 48832, 48855, 48867, 48868,
             48893],
            dtype='int64', length=1160)}

### Data Cleaning

In [298]:
#Displays number of empty/null cells in each column
dfx.isnull().sum()

id                                    0
name                                 16
host_id                               0
host_name                            21
neighbourhood_group                   0
neighbourhood                         0
latitude                              0
longitude                             0
room_type                             0
price                                 0
minimum_nights                        0
number_of_reviews                     0
last_review                       10052
reviews_per_month                 10052
calculated_host_listings_count        0
availability_365                      0
dtype: int64

In [299]:
#Displays number of non null cells in each column
dfx.notnull().sum()

id                                48895
name                              48879
host_id                           48895
host_name                         48874
neighbourhood_group               48895
neighbourhood                     48895
latitude                          48895
longitude                         48895
room_type                         48895
price                             48895
minimum_nights                    48895
number_of_reviews                 48895
last_review                       38843
reviews_per_month                 38843
calculated_host_listings_count    48895
availability_365                  48895
dtype: int64

In [300]:
#Dropping empty name rows using dropna function
dfx1 = dfx.dropna(subset=['name'], axis=0)
dfx1.head(3)

Unnamed: 0,id,name,host_id,host_name,neighbourhood_group,neighbourhood,latitude,longitude,room_type,price,minimum_nights,number_of_reviews,last_review,reviews_per_month,calculated_host_listings_count,availability_365
0,2539,Clean & quiet apt home by the park,2787,John,Brooklyn,Kensington,40.64749,-73.97237,Private room,149,1,9,2018-10-19,0.21,6,365
1,2595,Skylit Midtown Castle,2845,Jennifer,Manhattan,Midtown,40.75362,-73.98377,Entire home/apt,225,1,45,2019-05-21,0.38,2,355
2,3647,THE VILLAGE OF HARLEM....NEW YORK !,4632,Elisabeth,Manhattan,Harlem,40.80902,-73.9419,Private room,150,3,0,,,1,365


In [301]:
#it is used to fill empty values with any constant value in column or dataframe
dfx1.last_review.fillna('2018-01-01')

0        2018-10-19
1        2019-05-21
2        2018-01-01
3        2019-07-05
4        2018-11-19
            ...    
48890    2018-01-01
48891    2018-01-01
48892    2018-01-01
48893    2018-01-01
48894    2018-01-01
Name: last_review, Length: 48879, dtype: object

In [305]:
#Replacing null values from reviews_per_month with mean
dfx1.reviews_per_month.fillna(dfx1.reviews_per_month.mean(axis=0))

0        0.21000
1        0.38000
2        1.37341
3        4.64000
4        0.10000
          ...   
48890    1.37341
48891    1.37341
48892    1.37341
48893    1.37341
48894    1.37341
Name: reviews_per_month, Length: 48879, dtype: float64

In [302]:
#Renaming column name
dfx1.rename(columns={'last_review': 'last_review_date'}).head(1)

Unnamed: 0,id,name,host_id,host_name,neighbourhood_group,neighbourhood,latitude,longitude,room_type,price,minimum_nights,number_of_reviews,last_review_date,reviews_per_month,calculated_host_listings_count,availability_365
0,2539,Clean & quiet apt home by the park,2787,John,Brooklyn,Kensington,40.64749,-73.97237,Private room,149,1,9,2018-10-19,0.21,6,365


In [303]:
#Replacing values (look for room_type column)
dfx1.replace('Entire home/apt','Apartment').head(4)

Unnamed: 0,id,name,host_id,host_name,neighbourhood_group,neighbourhood,latitude,longitude,room_type,price,minimum_nights,number_of_reviews,last_review,reviews_per_month,calculated_host_listings_count,availability_365
0,2539,Clean & quiet apt home by the park,2787,John,Brooklyn,Kensington,40.64749,-73.97237,Private room,149,1,9,2018-10-19,0.21,6,365
1,2595,Skylit Midtown Castle,2845,Jennifer,Manhattan,Midtown,40.75362,-73.98377,Apartment,225,1,45,2019-05-21,0.38,2,355
2,3647,THE VILLAGE OF HARLEM....NEW YORK !,4632,Elisabeth,Manhattan,Harlem,40.80902,-73.9419,Private room,150,3,0,,,1,365
3,3831,Cozy Entire Floor of Brownstone,4869,LisaRoxanne,Brooklyn,Clinton Hill,40.68514,-73.95976,Apartment,89,1,270,2019-07-05,4.64,1,194
