# Pandas
<li>Pandas is an open-source Python package that is built on top of NumPy used for working with data sets.</li> 
<li>The name "Pandas" has a reference to <b>"Python Data Analysis".</b></li>
<li>Pandas is considered to be one of the best data-wrangling packages.</li>
<li>Pandas offers user-friendly, easy-to-use data structures and analysis tools for analyzing, cleaning, exploring and manipulating data.</li>
<li>It also functions well with various other data science Python modules.</li>


# Difference Between NumPy & Pandas

![](images/pandas_vs_numpy.png)

## Why Use Pandas?

<li>Pandas is known for its exceptional ability to represent and organize data.</li>
<li>The Pandas library was created to be able to work with large datasets faster and more efficiently than any other library.</li>
<li>It excels at analyzing huge amounts of data.Pandas allows us to analyze big data and make conclusions based on statistical theories.</li>
<li>Pandas can clean messy data sets, and make them readable and relevant.</li>
<li>By combining the functionality of Matplotlib and NumPy, Pandas offers users a powerful tool for performing <b>data analytics and visualization.</b></li>
<li>Data can be imported to Pandas from a variety of file formats, such as Csv, SQL, Excel, and JSON, among others.</li>
<li>Pandas is a versatile and marketable skill set for data analysts and data scientists that can gain the attention of employers.</li>


## Installation Of Pandas
<li>Go to your terminal, open and activate your virtual environment and then use the following commands for installing pandas.</li>

<code>
    pip install pandas
</code>

## Importing Pandas
<li>We need to import pandas if we want to create a pandas dataframe and perform any analysis on them.</li>
<li>We can import pandas package using the following command:</li>
<code>
    import pandas as pd
</code>

In [23]:
import pandas as pd

## How To Create A Pandas DataFrame
<li>A Pandas DataFrame is a 2 dimensional data structure, like a 2 dimensional array, arranged in a table like structure with rows and columns.</li>
<li>We can create a basic pandas dataframe by various methods.</li>
<li>Let's discuss some of the methods to create the given dataframes:</li>

![](images/dataframe.png)

### 1. From Python Dictionary

In [2]:
df1 = pd.DataFrame({'Name': ['Prabhat', 'Hari', 'Shyam',
                             'Sita', 'Mahima', 'Sunil', 'Bhawana'],
                   'Age': [24,34,50,32,18,23,22],
                   'Address': ['Manigram', 'Dhanewa', 'Bardaghat', 'Manglapur',
                              'Bharatpur', 'Kathmandu', 'Ramechap']})

In [3]:
df1

Unnamed: 0,Name,Age,Address
0,Prabhat,24,Manigram
1,Hari,34,Dhanewa
2,Shyam,50,Bardaghat
3,Sita,32,Manglapur
4,Mahima,18,Bharatpur
5,Sunil,23,Kathmandu
6,Bhawana,22,Ramechap


### 2. From a list of dictionaries

In [5]:
df2 = pd.DataFrame([{'Name': 'Prabhat', 'Age': 24, 'Address' : 'Manigram'},
                    {'Name': 'Hari', 'Age': 34, 'Address' : 'Dhanewa'},
                    {'Name': 'Shyam', 'Age': 50, 'Address' : 'Bardaghat'},
                    {'Name': 'Sita', 'Age': 32, 'Address' : 'Manglapur'},
                    {'Name': 'Mahima', 'Age': 18, 'Address' : 'Bharatpur'},
                    {'Name': 'Sunil', 'Age': 23, 'Address' : 'Kathmandu'},
                    {'Name': 'Bhawana', 'Age': 22, 'Address' : 'Ramechap'}
                   ])


In [6]:
df2

Unnamed: 0,Name,Age,Address
0,Prabhat,24,Manigram
1,Hari,34,Dhanewa
2,Shyam,50,Bardaghat
3,Sita,32,Manglapur
4,Mahima,18,Bharatpur
5,Sunil,23,Kathmandu
6,Bhawana,22,Ramechap


### 3. From a list of tuples

In [9]:
df3 = pd.DataFrame([('Prabhat', 24, 'Manigram'),
                    ('Hari', 34, 'Dhanewa'),
                    ('Shyam',50, 'Bardaghat'),
                    ('Sita', 32, 'Manglapur'),
                    ('Mahima', 18, 'Bharatpur'),
                    ('Sunil', 23, 'Kathmandu'),
                    ('Bhawana', 22, 'Ramechap')
                   ], columns = ['Name', 'Age', 'Address'])

In [10]:
df3

Unnamed: 0,Name,Age,Address
0,Prabhat,24,Manigram
1,Hari,34,Dhanewa
2,Shyam,50,Bardaghat
3,Sita,32,Manglapur
4,Mahima,18,Bharatpur
5,Sunil,23,Kathmandu
6,Bhawana,22,Ramechap


### 4. From list of lists

In [11]:
df4 = pd.DataFrame([['Prabhat', 24, 'Manigram'],
                    ['Hari', 34, 'Dhanewa'],
                    ['Shyam',50, 'Bardaghat'],
                    ['Sita', 32, 'Manglapur'],
                    ['Mahima', 18, 'Bharatpur'],
                    ['Sunil', 23, 'Kathmandu'],
                    ['Bhawana', 22, 'Ramechap']
                   ], columns = ['Name', 'Age', 'Address'])

In [12]:
df4

Unnamed: 0,Name,Age,Address
0,Prabhat,24,Manigram
1,Hari,34,Dhanewa
2,Shyam,50,Bardaghat
3,Sita,32,Manglapur
4,Mahima,18,Bharatpur
5,Sunil,23,Kathmandu
6,Bhawana,22,Ramechap


#### Question:
<li>Read 'weather_data.csv' file using csv reader.</li>
<li>Store the data inside the csv file into a list of lists.</li>
<li>Then create a pandas dataframe using list of list.</li>

In [14]:
from csv import reader

file = open('weather_data.csv')
file_reader = reader(file)
data = list(file_reader)
print(data)

[['day', 'temperature', 'windspeed', 'event'], ['1/1/2017', '32', '6', 'Rain'], ['1/4/2017', 'not available', '9', 'Sunny'], ['1/5/2017', '-1', 'not measured', 'Snow'], ['1/6/2017', 'not available', '7', 'no event'], ['1/7/2017', '32', 'not measured', 'Rain'], ['1/8/2017', 'not available', 'not measured', 'Sunny'], ['1/9/2017', 'not available', 'not measured', 'no event'], ['1/10/2017', '34', '8', 'Cloudy'], ['1/11/2017', '-4', '-1', 'Snow'], ['1/12/2017', '26', '12', 'Sunny'], ['1/13/2017', '12', '12', 'Rainy'], ['1/11/2017', '-1', '12', 'Snow'], ['1/14/2017', '40', '-1', 'Sunny']]


In [15]:
weather_df = pd.DataFrame(data[1:], columns = data[0])
weather_df

Unnamed: 0,day,temperature,windspeed,event
0,1/1/2017,32,6,Rain
1,1/4/2017,not available,9,Sunny
2,1/5/2017,-1,not measured,Snow
3,1/6/2017,not available,7,no event
4,1/7/2017,32,not measured,Rain
5,1/8/2017,not available,not measured,Sunny
6,1/9/2017,not available,not measured,no event
7,1/10/2017,34,8,Cloudy
8,1/11/2017,-4,-1,Snow
9,1/12/2017,26,12,Sunny


#### Question
<li>1. Read 'imports-85.data' file using file reader.</li>
<li>2. Store the data present inside the file into a list of list.</li>
<li>3. Create a pandas dataframe using list of lists.</li>
<li>4. For column name, we can use the columns variable given below.</li>

In [20]:
# total_data = []
# file = open('imports-85.data', 'r')
# data_read = file.readlines()
# for item in data_read:
#     item_list = item.split('\n')[:-1]
#     new_item_list = item[0].split(',')
#     total_data.append(new_item_list)


In [21]:
# print(total_data)

In [None]:
columns = ['symboling', 'normalized_losses', 'make', 'fuel_type', 'aspiration', 'num_of_doors',
          'body_style', 'drive_wheels', 'engine_location', 'wheel_base', 'length', 'width', 
           'height', 'curb_weight', 'engine_type', 'num_of_cylinders', 'engine_size', 'fuel_system',
          'bore', 'stroke', 'compression', 'horsepower', 'peak_rpm', 'city_mpg', 'highway_mpg', 
           'price']

### 5. Pandas Dataframe From Csv files

<li>We can load a csv file and create a dataframe out of the data present inside a csv file using pandas.</li>
<li>We have <b>.read_csv()</b> method to read a csv file and create a pandas dataframe from the dataset.</li>

In [16]:
weather_df = pd.read_csv('weather_data.csv')

In [17]:
weather_df

Unnamed: 0,day,temperature,windspeed,event
0,1/1/2017,32,6,Rain
1,1/4/2017,not available,9,Sunny
2,1/5/2017,-1,not measured,Snow
3,1/6/2017,not available,7,no event
4,1/7/2017,32,not measured,Rain
5,1/8/2017,not available,not measured,Sunny
6,1/9/2017,not available,not measured,no event
7,1/10/2017,34,8,Cloudy
8,1/11/2017,-4,-1,Snow
9,1/12/2017,26,12,Sunny


### Reading a csv file using skiprows and header parameters

In [26]:
weather_df = pd.read_csv('weather_data.csv',skiprows = 1)

In [27]:
weather_df

Unnamed: 0,day,temperature,windspeed,event
0,1/1/2017,32,6,Rain
1,1/4/2017,not available,9,Sunny
2,1/5/2017,-1,not measured,Snow
3,1/6/2017,not available,7,no event
4,1/7/2017,32,not measured,Rain
5,1/8/2017,not available,not measured,Sunny
6,1/9/2017,not available,not measured,no event
7,1/10/2017,34,8,Cloudy
8,1/11/2017,-4,-1,Snow
9,1/12/2017,26,12,Sunny


In [34]:
weather_df = pd.read_csv('weather_data.csv', header = 2)
weather_df

Unnamed: 0,day,temperature,windspeed,event
0,1/1/2017,32,6,Rain
1,1/4/2017,not available,9,Sunny
2,1/5/2017,-1,not measured,Snow
3,1/6/2017,not available,7,no event
4,1/7/2017,32,not measured,Rain
5,1/8/2017,not available,not measured,Sunny
6,1/9/2017,not available,not measured,no event
7,1/10/2017,34,8,Cloudy
8,1/11/2017,-4,-1,Snow
9,1/12/2017,26,12,Sunny


#### Reading a csv file without header and giving names to the columns

In [40]:
weather_df = pd.read_csv('weather_data.csv',skiprows = 3, header = None,
                        names = ['dates', 'temp', 'ws', 'forecast'])
weather_df

Unnamed: 0,dates,temp,ws,forecast
0,1/1/2017,32,6,Rain
1,1/4/2017,not available,9,Sunny
2,1/5/2017,-1,not measured,Snow
3,1/6/2017,not available,7,no event
4,1/7/2017,32,not measured,Rain
5,1/8/2017,not available,not measured,Sunny
6,1/9/2017,not available,not measured,no event
7,1/10/2017,34,8,Cloudy
8,1/11/2017,-4,-1,Snow
9,1/12/2017,26,12,Sunny


#### Read limited data from a csv file using nrows parameters


In [43]:
weather_df = pd.read_csv('weather_data.csv',skiprows = 3,nrows = 5, header = None,
                        names = ['dates', 'temp', 'ws', 'forecast'])
weather_df

Unnamed: 0,dates,temp,ws,forecast
0,1/1/2017,32,6,Rain
1,1/4/2017,not available,9,Sunny
2,1/5/2017,-1,not measured,Snow
3,1/6/2017,not available,7,no event
4,1/7/2017,32,not measured,Rain


#### Reading csv files with na_values parameters ('weather_data.csv' file)


In [52]:
weather_df = pd.read_csv('weather_data.csv',skiprows = 2,
#                         na_values = ['not available', 'not measured', 
#                                     'no event']
                        )
weather_df

Unnamed: 0,day,temperature,windspeed,event
0,1/1/2017,32,6,Rain
1,1/4/2017,not available,9,Sunny
2,1/5/2017,-1,not measured,Snow
3,1/6/2017,not available,7,no event
4,1/7/2017,32,not measured,Rain
5,1/8/2017,not available,not measured,Sunny
6,1/9/2017,not available,not measured,no event
7,1/10/2017,34,8,Cloudy
8,1/11/2017,-4,-1,Snow
9,1/12/2017,26,12,Sunny


In [24]:
weather_df = pd.read_csv('weather_data.csv',skiprows = 2,
                        na_values = {'temperature': 'not available',
                                     'windspeed': ['not measured', -1],
                                    'event': 'no event'})
weather_df

#### Write a pandas dataframe to a csv file
<li>We can write a pandas dataframe to a csv file using .to_csv() method.</li>
<li>You can specify any name to the csv file while writing a pandas dataframe into a csv file.</li>

In [57]:
weather_df.to_csv('weather_data_nan.csv', index = False)

### 6. Pandas Dataframe From Xcel files

<li>We can load an excel file with <b>.xlsx</b> extension and create a dataframe out of the data present inside an excel file using pandas.</li>
<li>We have <b>.read_excel()</b> method to read a csv file and create a pandas dataframe from the dataset.</li>
<li>We also need to install <b>openpyxl</b> for working with excel files.</li>

In [4]:
weather_df = pd.read_excel('weather_data.xlsx',
                           na_values = {'temperature': 'not available',
                                     'windspeed': ['not measured', -1],
                                    'event': 'no event'})
weather_df

Unnamed: 0,day,temperature,windspeed,event
0,1/1/2017,32.0,6.0,Rain
1,1/4/2017,,9.0,Sunny
2,1/5/2017,-1.0,,Snow
3,1/6/2017,,7.0,
4,1/7/2017,32.0,,Rain
5,1/8/2017,,,Sunny
6,1/9/2017,,,
7,1/10/2017,34.0,8.0,Cloudy
8,1/11/2017,-4.0,,Snow
9,1/12/2017,26.0,12.0,Sunny


#### Writing to an excel file
<li>We can write a pandas dataframe into a excel file using .to_excel() method.</li>

In [5]:
weather_df.to_excel('weather_data.xlsx', 'nans')

#### Using head() and tail() method to see top 5 and last 5 rows
<li>To view the first few rows of our dataframe, we can use the DataFrame.head() method.</li>
<li>By default, it returns the first five rows of our dataframe.</li>
<li>However, it also accepts an optional integer parameter, which specifies the number of rows.</li>

<li>Similarly, to view the last few rows of our dataframe, we can use the DataFrame.tail() method.</li>
<li>By default, it returns the last five rows of our dataframe.</li>
<li>However, it also accepts an optional integer parameter, which specifies the number of rows.</li>

In [28]:
weather_df = pd.read_csv('weather_data.csv', skiprows = 2)
weather_df.head(3)

Unnamed: 0,day,temperature,windspeed,event
0,1/1/2017,32,6,Rain
1,1/4/2017,not available,9,Sunny
2,1/5/2017,-1,not measured,Snow


In [29]:
weather_df.tail(3)

Unnamed: 0,day,temperature,windspeed,event
10,1/13/2017,12,12,Rainy
11,1/11/2017,-1,12,Snow
12,1/14/2017,40,-1,Sunny


#### Question:

<li>Use the head() method to select the first 6 rows.</li>
<li>Use the tail() method to select the last 8 rows.</li>

In [31]:
weather_df.head(6)

Unnamed: 0,day,temperature,windspeed,event
0,1/1/2017,32,6,Rain
1,1/4/2017,not available,9,Sunny
2,1/5/2017,-1,not measured,Snow
3,1/6/2017,not available,7,no event
4,1/7/2017,32,not measured,Rain
5,1/8/2017,not available,not measured,Sunny


In [32]:
weather_df.tail(8)

Unnamed: 0,day,temperature,windspeed,event
5,1/8/2017,not available,not measured,Sunny
6,1/9/2017,not available,not measured,no event
7,1/10/2017,34,8,Cloudy
8,1/11/2017,-4,-1,Snow
9,1/12/2017,26,12,Sunny
10,1/13/2017,12,12,Rainy
11,1/11/2017,-1,12,Snow
12,1/14/2017,40,-1,Sunny


#### Finding the column names from the dataframe
<li>We have df.columns attributes to check the name of columns in the pandas dataframe.</li>
<li>Similarly, we have df.values attributes to check the data present in the pandas dataframe.</li>

In [33]:
weather_df.columns

Index(['day', 'temperature', 'windspeed', 'event'], dtype='object')

In [34]:
print(type(weather_df.columns))

<class 'pandas.core.indexes.base.Index'>


In [36]:
weather_df.columns[-2:]

Index(['windspeed', 'event'], dtype='object')

In [38]:
list(weather_df.columns)[:2]

['day', 'temperature']

In [39]:
weather_df.values

array([['1/1/2017', '32', '6', 'Rain'],
       ['1/4/2017', 'not available', '9', 'Sunny'],
       ['1/5/2017', '-1', 'not measured', 'Snow'],
       ['1/6/2017', 'not available', '7', 'no event'],
       ['1/7/2017', '32', 'not measured', 'Rain'],
       ['1/8/2017', 'not available', 'not measured', 'Sunny'],
       ['1/9/2017', 'not available', 'not measured', 'no event'],
       ['1/10/2017', '34', '8', 'Cloudy'],
       ['1/11/2017', '-4', '-1', 'Snow'],
       ['1/12/2017', '26', '12', 'Sunny'],
       ['1/13/2017', '12', '12', 'Rainy'],
       ['1/11/2017', '-1', '12', 'Snow'],
       ['1/14/2017', '40', '-1', 'Sunny']], dtype=object)

In [40]:
type(weather_df.values)

numpy.ndarray

In [42]:
weather_df.values.shape

(13, 4)

In [43]:
weather_df.values.ndim

2

In [44]:
weather_df.size

52

In [45]:
weather_df.values[-5:]

array([['1/11/2017', '-4', '-1', 'Snow'],
       ['1/12/2017', '26', '12', 'Sunny'],
       ['1/13/2017', '12', '12', 'Rainy'],
       ['1/11/2017', '-1', '12', 'Snow'],
       ['1/14/2017', '40', '-1', 'Sunny']], dtype=object)

In [51]:
weather_df.values[weather_df.values[:,-1] == 'Sunny']

array([['1/4/2017', 'not available', '9', 'Sunny'],
       ['1/8/2017', 'not available', 'not measured', 'Sunny'],
       ['1/12/2017', '26', '12', 'Sunny'],
       ['1/14/2017', '40', '-1', 'Sunny']], dtype=object)

In [53]:
weather_df.values[weather_df.values[:,1] == 'not available']

array([['1/4/2017', 'not available', '9', 'Sunny'],
       ['1/6/2017', 'not available', '7', 'no event'],
       ['1/8/2017', 'not available', 'not measured', 'Sunny'],
       ['1/9/2017', 'not available', 'not measured', 'no event']],
      dtype=object)

In [56]:
weather_df.values[weather_df.values[:,2] == 'not measured']

array([['1/5/2017', '-1', 'not measured', 'Snow'],
       ['1/7/2017', '32', 'not measured', 'Rain'],
       ['1/8/2017', 'not available', 'not measured', 'Sunny'],
       ['1/9/2017', 'not available', 'not measured', 'no event']],
      dtype=object)

In [58]:
weather_df.values[weather_df.values[:,-1] == 'no event']

array([['1/6/2017', 'not available', '7', 'no event'],
       ['1/9/2017', 'not available', 'not measured', 'no event']],
      dtype=object)

#### Checking the type of your dataframe 
<li>Another feature that makes pandas better for working with data is that dataframes can contain more than one data type.</li>
<li>Axis values can have string labels, not just numeric ones.</li>
<li>Dataframes can contain columns with multiple data types: including integer, float, and string.</li>
<li>We can use the DataFrame.dtypes attribute (similar to NumPy) to return information about the types of each column.</li>
<li>When we import data, pandas attempts to guess the correct dtype for each column.</li>
<li>Generally, pandas does well with this, which means we don't need to worry about specifying dtypes every time we start to work with data.</li>



In [60]:
weather_df.dtypes

day            object
temperature    object
windspeed      object
event          object
dtype: object

In [61]:
weather_df_nan = pd.read_csv('weather_data_nan.csv')
weather_df_nan.head()

Unnamed: 0,day,temperature,windspeed,event
0,1/1/2017,32.0,6.0,Rain
1,1/4/2017,,9.0,Sunny
2,1/5/2017,-1.0,,Snow
3,1/6/2017,,7.0,
4,1/7/2017,32.0,,Rain


In [62]:
weather_df_nan.dtypes

day             object
temperature    float64
windspeed      float64
event           object
dtype: object

#### Datatypes Information
<li>We can get the shape of the dataset using <b>.shape()</b> method.</li>
<li><b>.shape()</b> method returns the tuple datatype containing the number of rows and number of columns in the dataset.</li>
<li>If we wanted an overview of all the dtypes used in our dataframe, we can use <b>.info()</b> method.</li>
<li>Note that <b>DataFrame.info()</b> prints the information, rather than returning it, so we can't assign it to a variable.</li>


In [64]:
weather_df_nan.shape

(13, 4)

In [66]:
weather_df_nan.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 13 entries, 0 to 12
Data columns (total 4 columns):
 #   Column       Non-Null Count  Dtype  
---  ------       --------------  -----  
 0   day          13 non-null     object 
 1   temperature  9 non-null      float64
 2   windspeed    7 non-null      float64
 3   event        11 non-null     object 
dtypes: float64(2), object(2)
memory usage: 544.0+ bytes


#### Checking the null values in the pandas dataframe

In [68]:
weather_df_nan.isnull().sum()

day            0
temperature    4
windspeed      6
event          2
dtype: int64

#### set_index() and reset_index() method

In [71]:
weather_df_nan = pd.read_csv('weather_data_nan.csv',
                             parse_dates = ['day'])
weather_df_nan.head()

Unnamed: 0,day,temperature,windspeed,event
0,2017-01-01,32.0,6.0,Rain
1,2017-01-04,,9.0,Sunny
2,2017-01-05,-1.0,,Snow
3,2017-01-06,,7.0,
4,2017-01-07,32.0,,Rain


In [72]:
weather_df_nan.dtypes

day            datetime64[ns]
temperature           float64
windspeed             float64
event                  object
dtype: object

In [73]:
day_index_weather_df = weather_df_nan.set_index('day')
day_index_weather_df

Unnamed: 0_level_0,temperature,windspeed,event
day,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
2017-01-01,32.0,6.0,Rain
2017-01-04,,9.0,Sunny
2017-01-05,-1.0,,Snow
2017-01-06,,7.0,
2017-01-07,32.0,,Rain
2017-01-08,,,Sunny
2017-01-09,,,
2017-01-10,34.0,8.0,Cloudy
2017-01-11,-4.0,,Snow
2017-01-12,26.0,12.0,Sunny


In [75]:
weather_df_nan.set_index('day', inplace = True)
weather_df_nan

Unnamed: 0_level_0,temperature,windspeed,event
day,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
2017-01-01,32.0,6.0,Rain
2017-01-04,,9.0,Sunny
2017-01-05,-1.0,,Snow
2017-01-06,,7.0,
2017-01-07,32.0,,Rain
2017-01-08,,,Sunny
2017-01-09,,,
2017-01-10,34.0,8.0,Cloudy
2017-01-11,-4.0,,Snow
2017-01-12,26.0,12.0,Sunny


In [78]:
weather_df_nan.reset_index(inplace = True)
weather_df_nan

Unnamed: 0,day,temperature,windspeed,event
0,2017-01-01,32.0,6.0,Rain
1,2017-01-04,,9.0,Sunny
2,2017-01-05,-1.0,,Snow
3,2017-01-06,,7.0,
4,2017-01-07,32.0,,Rain
5,2017-01-08,,,Sunny
6,2017-01-09,,,
7,2017-01-10,34.0,8.0,Cloudy
8,2017-01-11,-4.0,,Snow
9,2017-01-12,26.0,12.0,Sunny


In [85]:
temperature_index_df = weather_df_nan.set_index('temperature')
temperature_index_df.head()

Unnamed: 0_level_0,day,windspeed,event
temperature,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
32.0,2017-01-01,6.0,Rain
,2017-01-04,9.0,Sunny
-1.0,2017-01-05,,Snow
,2017-01-06,7.0,
32.0,2017-01-07,,Rain


In [86]:
temperature_reset_index_df = temperature_index_df.reset_index(drop = True)

In [88]:
temperature_reset_index_df.reset_index(inplace = True)
temperature_reset_index_df

Unnamed: 0,index,day,windspeed,event
0,0,2017-01-01,6.0,Rain
1,1,2017-01-04,9.0,Sunny
2,2,2017-01-05,,Snow
3,3,2017-01-06,7.0,
4,4,2017-01-07,,Rain
5,5,2017-01-08,,Sunny
6,6,2017-01-09,,
7,7,2017-01-10,8.0,Cloudy
8,8,2017-01-11,,Snow
9,9,2017-01-12,12.0,Sunny


In [90]:
temperature_reset_index_df.reset_index(inplace = True, drop = True)
temperature_reset_index_df

Unnamed: 0,level_0,index,day,windspeed,event
0,0,0,2017-01-01,6.0,Rain
1,1,1,2017-01-04,9.0,Sunny
2,2,2,2017-01-05,,Snow
3,3,3,2017-01-06,7.0,
4,4,4,2017-01-07,,Rain
5,5,5,2017-01-08,,Sunny
6,6,6,2017-01-09,,
7,7,7,2017-01-10,8.0,Cloudy
8,8,8,2017-01-11,,Snow
9,9,9,2017-01-12,12.0,Sunny


#### Selecting a column from a pandas DataFrame

<li>Since our axis in pandas have labels, we can select data using those labels.</li> 
<li>Unlike in NumPy, we donot need to know the exact index location of a pandas dataframe.</li>
<li>To do this, we can use the DataFrame.loc[] attribute. The syntax for DataFrame.loc[] is:</li>
<code>
df.loc[row_label, column_label]
</code>

<li>We can use the following shortcut to select a single column:</li>
<code>
df["column_name"]
</code>

<li>This style of selecting columns is very common.</li>


#### Questions

<li>Read <b>'appointment_schedule.csv'</b> file using pandas.</li>
<li>Select the <b>'name'</b> column from the given dataset and store to <b>'appointment_names'</b> variable.</li>
<li>Use Python's <b>type()</b> function to assign the type of name column to <b>name_type</b>.</li>

#### Pandas Series
<li>Series is the pandas type for one-dimensional objects.</li>
<li>Anytime you see a 1D pandas object, it will be a series. Anytime you see a 2D pandas object, it will be a dataframe.</li>
<li>A dataframe is a collection of series objects, which is similar to how pandas stores the data behind the scenes.</li>

#### Datatype Conversion In Pandas

#### Performing operations with your dataframe

<li>We can calculate the average value of a particular columns using df.column_name.mean().</li>
<li>For calculating the minimum value in a particular column, we can use df.column_name.min().</li>
<li>Similarly, for calculating the maximum value in a particular column, we can use df.column_name.max().</li>

#### Finding the descriptive statistics of the dataframe using .describe() method

#### Indexing & Slicing In Pandas DataFrame

#### Conditionally Select the data from your dataframe (using loc and iloc method as well)

#### Handling Missing Values
<li>We can use fillna() method in pandas to fill missing values using different ways.</li>
<li>We can use interpolation method to make a guess on missing values.</li>
<li>We can use dropna() method to drop rows with missing values.</li>
<li>We can also fill missing values with the mean value, median value or the mode value depending on the values of columns.</li>
<li>Filling missing values with mean is appropriate when the column has continuous values.</li>
<li>If the data is categorical then filling missing values with median and mode is a good idea.</li>

#### fillna(method = 'ffill')

#### fillna(method = 'bfill')