# Pandas
<li>Pandas is an open-source Python package that is built on top of NumPy used for working with data sets.</li> 
<li>The name "Pandas" has a reference to <b>"Python Data Analysis".</b></li>
<li>Pandas is considered to be one of the best data-wrangling packages.</li>
<li>Pandas offers user-friendly, easy-to-use data structures and analysis tools for analyzing, cleaning, exploring and manipulating data.</li>
<li>It also functions well with various other data science Python modules.</li>


# Difference Between NumPy & Pandas

![](images/pandas_vs_numpy.png)

## Why Use Pandas?

<li>Pandas is known for its exceptional ability to represent and organize data.</li>
<li>The Pandas library was created to be able to work with large datasets faster and more efficiently than any other library.</li>
<li>It excels at analyzing huge amounts of data.Pandas allows us to analyze big data and make conclusions based on statistical theories.</li>
<li>Pandas can clean messy data sets, and make them readable and relevant.</li>
<li>By combining the functionality of Matplotlib and NumPy, Pandas offers users a powerful tool for performing <b>data analytics and visualization.</b></li>
<li>Data can be imported to Pandas from a variety of file formats, such as Csv, SQL, Excel, and JSON, among others.</li>
<li>Pandas is a versatile and marketable skill set for data analysts and data scientists that can gain the attention of employers.</li>


## Installation Of Pandas
<li>Go to your terminal, open and activate your virtual environment and then use the following commands for installing pandas.</li>

<code>
    pip install pandas
</code>

## Importing Pandas
<li>We need to import pandas if we want to create a pandas dataframe and perform any analysis on them.</li>
<li>We can import pandas package using the following command:</li>
<code>
    import pandas as pd
</code>

In [2]:
import pandas as pd

## How To Create A Pandas DataFrame
<li>A Pandas DataFrame is a 2 dimensional data structure, like a 2 dimensional array, arranged in a table like structure with rows and columns.</li>
<li>We can create a basic pandas dataframe by various methods.</li>
<li>Let's discuss some of the methods to create the given dataframes:</li>

![](images/dataframe.png)

### 1. From Python Dictionary

In [2]:
df1 = pd.DataFrame({'Name': ['Prabhat', 'Hari', 'Shyam',
                             'Sita', 'Mahima', 'Sunil', 'Bhawana'],
                   'Age': [24,34,50,32,18,23,22],
                   'Address': ['Manigram', 'Dhanewa', 'Bardaghat', 'Manglapur',
                              'Bharatpur', 'Kathmandu', 'Ramechap']})

In [3]:
df1

Unnamed: 0,Name,Age,Address
0,Prabhat,24,Manigram
1,Hari,34,Dhanewa
2,Shyam,50,Bardaghat
3,Sita,32,Manglapur
4,Mahima,18,Bharatpur
5,Sunil,23,Kathmandu
6,Bhawana,22,Ramechap


### 2. From a list of dictionaries

In [5]:
df2 = pd.DataFrame([{'Name': 'Prabhat', 'Age': 24, 'Address' : 'Manigram'},
                    {'Name': 'Hari', 'Age': 34, 'Address' : 'Dhanewa'},
                    {'Name': 'Shyam', 'Age': 50, 'Address' : 'Bardaghat'},
                    {'Name': 'Sita', 'Age': 32, 'Address' : 'Manglapur'},
                    {'Name': 'Mahima', 'Age': 18, 'Address' : 'Bharatpur'},
                    {'Name': 'Sunil', 'Age': 23, 'Address' : 'Kathmandu'},
                    {'Name': 'Bhawana', 'Age': 22, 'Address' : 'Ramechap'}
                   ])


In [6]:
df2

Unnamed: 0,Name,Age,Address
0,Prabhat,24,Manigram
1,Hari,34,Dhanewa
2,Shyam,50,Bardaghat
3,Sita,32,Manglapur
4,Mahima,18,Bharatpur
5,Sunil,23,Kathmandu
6,Bhawana,22,Ramechap


### 3. From a list of tuples

In [9]:
df3 = pd.DataFrame([('Prabhat', 24, 'Manigram'),
                    ('Hari', 34, 'Dhanewa'),
                    ('Shyam',50, 'Bardaghat'),
                    ('Sita', 32, 'Manglapur'),
                    ('Mahima', 18, 'Bharatpur'),
                    ('Sunil', 23, 'Kathmandu'),
                    ('Bhawana', 22, 'Ramechap')
                   ], columns = ['Name', 'Age', 'Address'])

In [10]:
df3

Unnamed: 0,Name,Age,Address
0,Prabhat,24,Manigram
1,Hari,34,Dhanewa
2,Shyam,50,Bardaghat
3,Sita,32,Manglapur
4,Mahima,18,Bharatpur
5,Sunil,23,Kathmandu
6,Bhawana,22,Ramechap


### 4. From list of lists

In [11]:
df4 = pd.DataFrame([['Prabhat', 24, 'Manigram'],
                    ['Hari', 34, 'Dhanewa'],
                    ['Shyam',50, 'Bardaghat'],
                    ['Sita', 32, 'Manglapur'],
                    ['Mahima', 18, 'Bharatpur'],
                    ['Sunil', 23, 'Kathmandu'],
                    ['Bhawana', 22, 'Ramechap']
                   ], columns = ['Name', 'Age', 'Address'])

In [12]:
df4

Unnamed: 0,Name,Age,Address
0,Prabhat,24,Manigram
1,Hari,34,Dhanewa
2,Shyam,50,Bardaghat
3,Sita,32,Manglapur
4,Mahima,18,Bharatpur
5,Sunil,23,Kathmandu
6,Bhawana,22,Ramechap


In [14]:
from csv import reader

file = open('weather_data.csv')
file_reader = reader(file)
data = list(file_reader)
print(data)

[['day', 'temperature', 'windspeed', 'event'], ['1/1/2017', '32', '6', 'Rain'], ['1/4/2017', 'not available', '9', 'Sunny'], ['1/5/2017', '-1', 'not measured', 'Snow'], ['1/6/2017', 'not available', '7', 'no event'], ['1/7/2017', '32', 'not measured', 'Rain'], ['1/8/2017', 'not available', 'not measured', 'Sunny'], ['1/9/2017', 'not available', 'not measured', 'no event'], ['1/10/2017', '34', '8', 'Cloudy'], ['1/11/2017', '-4', '-1', 'Snow'], ['1/12/2017', '26', '12', 'Sunny'], ['1/13/2017', '12', '12', 'Rainy'], ['1/11/2017', '-1', '12', 'Snow'], ['1/14/2017', '40', '-1', 'Sunny']]


In [15]:
weather_df = pd.DataFrame(data[1:], columns = data[0])
weather_df

Unnamed: 0,day,temperature,windspeed,event
0,1/1/2017,32,6,Rain
1,1/4/2017,not available,9,Sunny
2,1/5/2017,-1,not measured,Snow
3,1/6/2017,not available,7,no event
4,1/7/2017,32,not measured,Rain
5,1/8/2017,not available,not measured,Sunny
6,1/9/2017,not available,not measured,no event
7,1/10/2017,34,8,Cloudy
8,1/11/2017,-4,-1,Snow
9,1/12/2017,26,12,Sunny


### 5. Pandas Dataframe From Csv files

<li>We can load a csv file and create a dataframe out of the data present inside a csv file using pandas.</li>
<li>We have <b>.read_csv()</b> method to read a csv file and create a pandas dataframe from the dataset.</li>

In [16]:
weather_df = pd.read_csv('weather_data.csv')

In [17]:
weather_df

Unnamed: 0,day,temperature,windspeed,event
0,1/1/2017,32,6,Rain
1,1/4/2017,not available,9,Sunny
2,1/5/2017,-1,not measured,Snow
3,1/6/2017,not available,7,no event
4,1/7/2017,32,not measured,Rain
5,1/8/2017,not available,not measured,Sunny
6,1/9/2017,not available,not measured,no event
7,1/10/2017,34,8,Cloudy
8,1/11/2017,-4,-1,Snow
9,1/12/2017,26,12,Sunny


### Reading a csv file using skiprows and header parameters

In [26]:
weather_df = pd.read_csv('weather_data.csv',skiprows = 1)

In [27]:
weather_df

Unnamed: 0,day,temperature,windspeed,event
0,1/1/2017,32,6,Rain
1,1/4/2017,not available,9,Sunny
2,1/5/2017,-1,not measured,Snow
3,1/6/2017,not available,7,no event
4,1/7/2017,32,not measured,Rain
5,1/8/2017,not available,not measured,Sunny
6,1/9/2017,not available,not measured,no event
7,1/10/2017,34,8,Cloudy
8,1/11/2017,-4,-1,Snow
9,1/12/2017,26,12,Sunny


In [34]:
weather_df = pd.read_csv('weather_data.csv', header = 2)
weather_df

Unnamed: 0,day,temperature,windspeed,event
0,1/1/2017,32,6,Rain
1,1/4/2017,not available,9,Sunny
2,1/5/2017,-1,not measured,Snow
3,1/6/2017,not available,7,no event
4,1/7/2017,32,not measured,Rain
5,1/8/2017,not available,not measured,Sunny
6,1/9/2017,not available,not measured,no event
7,1/10/2017,34,8,Cloudy
8,1/11/2017,-4,-1,Snow
9,1/12/2017,26,12,Sunny


#### Reading a csv file without header and giving names to the columns

In [40]:
weather_df = pd.read_csv('weather_data.csv',skiprows = 3, header = None,
                        names = ['dates', 'temp', 'ws', 'forecast'])
weather_df

Unnamed: 0,dates,temp,ws,forecast
0,1/1/2017,32,6,Rain
1,1/4/2017,not available,9,Sunny
2,1/5/2017,-1,not measured,Snow
3,1/6/2017,not available,7,no event
4,1/7/2017,32,not measured,Rain
5,1/8/2017,not available,not measured,Sunny
6,1/9/2017,not available,not measured,no event
7,1/10/2017,34,8,Cloudy
8,1/11/2017,-4,-1,Snow
9,1/12/2017,26,12,Sunny


#### Read limited data from a csv file using nrows parameters


In [43]:
weather_df = pd.read_csv('weather_data.csv',skiprows = 3,nrows = 5, header = None,
                        names = ['dates', 'temp', 'ws', 'forecast'])
weather_df

Unnamed: 0,dates,temp,ws,forecast
0,1/1/2017,32,6,Rain
1,1/4/2017,not available,9,Sunny
2,1/5/2017,-1,not measured,Snow
3,1/6/2017,not available,7,no event
4,1/7/2017,32,not measured,Rain


#### Reading csv files with na_values parameters ('weather_data.csv' file)


In [52]:
weather_df = pd.read_csv('weather_data.csv',skiprows = 2,
#                         na_values = ['not available', 'not measured', 
#                                     'no event']
                        )
weather_df

Unnamed: 0,day,temperature,windspeed,event
0,1/1/2017,32,6,Rain
1,1/4/2017,not available,9,Sunny
2,1/5/2017,-1,not measured,Snow
3,1/6/2017,not available,7,no event
4,1/7/2017,32,not measured,Rain
5,1/8/2017,not available,not measured,Sunny
6,1/9/2017,not available,not measured,no event
7,1/10/2017,34,8,Cloudy
8,1/11/2017,-4,-1,Snow
9,1/12/2017,26,12,Sunny


In [55]:
weather_df = pd.read_csv('weather_data.csv',skiprows = 2,
                        na_values = {'temperature': 'not available',
                                     'windspeed': ['not measured', -1],
                                    'event': 'no event'})
weather_df

Unnamed: 0,day,temperature,windspeed,event
0,1/1/2017,32.0,6.0,Rain
1,1/4/2017,,9.0,Sunny
2,1/5/2017,-1.0,,Snow
3,1/6/2017,,7.0,
4,1/7/2017,32.0,,Rain
5,1/8/2017,,,Sunny
6,1/9/2017,,,
7,1/10/2017,34.0,8.0,Cloudy
8,1/11/2017,-4.0,,Snow
9,1/12/2017,26.0,12.0,Sunny


#### Write a pandas dataframe to a csv file
<li>We can write a pandas dataframe to a csv file using .to_csv() method.</li>
<li>You can specify any name to the csv file while writing a pandas dataframe into a csv file.</li>

In [57]:
weather_df.to_csv('weather_data_nan.csv', index = False)

### 6. Pandas Dataframe From Xcel files

<li>We can load an excel file with <b>.xlsx</b> extension and create a dataframe out of the data present inside an excel file using pandas.</li>
<li>We have <b>.read_excel()</b> method to read a csv file and create a pandas dataframe from the dataset.</li>
<li>We also need to install <b>openpyxl</b> for working with excel files.</li>

In [4]:
weather_df = pd.read_excel('weather_data.xlsx',
                           na_values = {'temperature': 'not available',
                                     'windspeed': ['not measured', -1],
                                    'event': 'no event'})
weather_df

Unnamed: 0,day,temperature,windspeed,event
0,1/1/2017,32.0,6.0,Rain
1,1/4/2017,,9.0,Sunny
2,1/5/2017,-1.0,,Snow
3,1/6/2017,,7.0,
4,1/7/2017,32.0,,Rain
5,1/8/2017,,,Sunny
6,1/9/2017,,,
7,1/10/2017,34.0,8.0,Cloudy
8,1/11/2017,-4.0,,Snow
9,1/12/2017,26.0,12.0,Sunny


#### Writing to an excel file
<li>We can write a pandas dataframe into a excel file using .to_excel() method.</li>

In [5]:
weather_df.to_excel('weather_data.xlsx', 'nans')

#### Using head() and tail() method to see top 5 and last 5 rows

#### Indexing & Slicing In Pandas DataFrame

#### Finding the column names from the dataframe

#### Checking the type of your dataframe 

#### Checking the datatypes of overall dataframe using .info() method

#### Performing operations with your dataframe

<li>We can calculate the average value of a particular columns using df.column_name.mean().</li>
<li>For calculating the minimum value in a particular column, we can use df.column_name.min().</li>
<li>Similarly, for calculating the maximum value in a particular column, we can use df.column_name.max().</li>

#### Finding the descriptive statistics of the dataframe using .describe() method

#### Conditionally Select the data from your dataframe (using loc and iloc method as well)

#### Convert String Column Into Datetype

#### set_index() and reset_index() method

#### Handling Missing Values
<li>We can use fillna() method in pandas to fill missing values using different ways.</li>
<li>We can use interpolation method to make a guess on missing values.</li>
<li>We can use dropna() method to drop rows with missing values.</li>
<li>We can also fill missing values with the mean value, median value or the mode value depending on the values of columns.</li>
<li>Filling missing values with mean is appropriate when the column has continuous values.</li>
<li>If the data is categorical then filling missing values with median and mode is a good idea.</li>

#### fillna(method = 'ffill')

#### fillna(method = 'bfill')