# Pandas
<li>Pandas is an open-source Python package that is built on top of NumPy used for working with data sets.</li> 
<li>The name "Pandas" has a reference to <b>"Python Data Analysis".</b></li>
<li>Pandas is considered to be one of the best data-wrangling packages.</li>
<li>Pandas offers user-friendly, easy-to-use data structures and analysis tools for analyzing, cleaning, exploring and manipulating data.</li>
<li>It also functions well with various other data science Python modules.</li>


## Why Use Pandas?

<li>Pandas is known for its exceptional ability to represent and organize data.</li>
<li>The Pandas library was created to be able to work with large datasets faster and more efficiently than any other library.</li>
<li>It excels at analyzing huge amounts of data.Pandas allows us to analyze big data and make conclusions based on statistical theories.</li>
<li>Pandas can clean messy data sets, and make them readable and relevant.</li>
<li>By combining the functionality of Matplotlib and NumPy, Pandas offers users a powerful tool for performing <b>data analytics and visualization.</b></li>
<li>Data can be imported to Pandas from a variety of file formats, such as Csv, SQL, Excel, and JSON, among others.</li>
<li>Pandas is a versatile and marketable skill set for data analysts and data scientists that can gain the attention of employers.</li>


## Installation Of Pandas
<li>Go to your terminal, open and activate your virtual environment and then use the following commands for installing pandas.</li>

<code>
    pip install pandas
</code>

## Importing Pandas
<li>We need to import pandas if we want to create a pandas dataframe and perform any analysis on them.</li>
<li>We can import pandas package using the following command:</li>
<code>
    import pandas as pd
</code>

In [1]:
import pandas as pd

## How To Create A Pandas DataFrame
<li>A Pandas DataFrame is a 2 dimensional data structure, like a 2 dimensional array, arranged in a table like structure with rows and columns.</li>
<li>We can create a basic pandas dataframe by various methods.</li>
<li>Let's discuss some of the methods to create the given dataframes:</li>

![](images/dataframe.png)

### 1. From Python Dictionary

In [2]:
df = pd.DataFrame({"Name": ['Prabhat', 'Ram', 'Shyam', 'Asmita', 'Alisha'],
                  "Age": [24, 23, 32, 21, 20],
                  "Gender": ['Male', 'Male', 'Male', 'Female', 'Female']})

In [3]:
df

Unnamed: 0,Name,Age,Gender
0,Prabhat,24,Male
1,Ram,23,Male
2,Shyam,32,Male
3,Asmita,21,Female
4,Alisha,20,Female


### 2. From a list of dictionaries

In [4]:
df = pd.DataFrame([{'Name': 'Prabhat', 'Age': 24, 'Gender': 'Male'},
                  {'Name': 'Ram', 'Age': 23, 'Gender': 'Male'},
                  {'Name': 'Shyam', 'Age': 32, 'Gender': 'Male'},
                  {'Name': 'Asmita', 'Age': 21, 'Gender': 'Female'},
                  {'Name': 'Alisha', 'Age': 20, 'Gender': 'Female'}])

In [5]:
df

Unnamed: 0,Name,Age,Gender
0,Prabhat,24,Male
1,Ram,23,Male
2,Shyam,32,Male
3,Asmita,21,Female
4,Alisha,20,Female


### 3. From a list of tuples

In [8]:
df = pd.DataFrame([('Prabhat', 24,'Male'),
                  ('Ram', 23, 'Male'),
                  ('Shyam', 32, 'Male'),
                  ('Asmita', 21, 'Female'),
                  ('Alisha', 20, 'Female')],
                 columns = ['Name', 'Age', 'Gender'])

In [9]:
df

Unnamed: 0,Name,Age,Gender
0,Prabhat,24,Male
1,Ram,23,Male
2,Shyam,32,Male
3,Asmita,21,Female
4,Alisha,20,Female


### 4. From list of lists

In [12]:
df = pd.DataFrame([['Prabhat', 24,'Male'],
                  ['Ram', 23, 'Male'],
                  ['Shyam', 32, 'Male'],
                  ['Asmita', 21, 'Female'],
                  ['Alisha', 20, 'Female']],
                 columns = ['Name', 'Age', 'Gender'])

In [13]:
df

Unnamed: 0,Name,Age,Gender
0,Prabhat,24,Male
1,Ram,23,Male
2,Shyam,32,Male
3,Asmita,21,Female
4,Alisha,20,Female


## Question:
<li>Read 'weather_data.csv' file using csv reader.</li>
<li>Store the data inside the csv file into a list of lists.</li>
<li>Then create a pandas dataframe using list of list.</li>

In [14]:
from csv import reader

In [15]:
csv_file = open('weather_data.csv')
file_read = reader(csv_file)
data = list(file_read)
print(data)

[['kfjkdfjskd'], ['dfuhsdjufio'], ['day', 'temperature', 'windspeed', 'event'], ['1/1/2017', '32', '6', 'Rain'], ['1/4/2017', 'not available', '9', 'Sunny'], ['1/5/2017', '-1', 'not measured', 'Snow'], ['1/6/2017', 'not available', '7', 'no event'], ['1/7/2017', '32', 'not measured', 'Rain'], ['1/8/2017', 'not available', 'not measured', 'Sunny'], ['1/9/2017', 'not available', 'not measured', 'no event'], ['1/10/2017', '34', '8', 'Cloudy'], ['1/11/2017', '-4', '-1', 'Snow'], ['1/12/2017', '26', '12', 'Sunny'], ['1/13/2017', '12', '12', 'Rainy'], ['1/11/2017', '-1', '12', 'Snow'], ['1/14/2017', '40', '-1', 'Sunny']]


In [19]:
columns = data[2]
print(columns)

['day', 'temperature', 'windspeed', 'event']


In [21]:
list_of_list_data = data[3:]

In [22]:
weather_df = pd.DataFrame(data = list_of_list_data , 
                         columns = columns)
weather_df

Unnamed: 0,day,temperature,windspeed,event
0,1/1/2017,32,6,Rain
1,1/4/2017,not available,9,Sunny
2,1/5/2017,-1,not measured,Snow
3,1/6/2017,not available,7,no event
4,1/7/2017,32,not measured,Rain
5,1/8/2017,not available,not measured,Sunny
6,1/9/2017,not available,not measured,no event
7,1/10/2017,34,8,Cloudy
8,1/11/2017,-4,-1,Snow
9,1/12/2017,26,12,Sunny


### 5. Pandas Dataframe From Csv files

<li>We can load a csv file and create a dataframe out of the data present inside a csv file using pandas.</li>
<li>We have <b>.read_csv()</b> method to read a csv file and create a pandas dataframe from the dataset.</li>

In [23]:
car_details_df = pd.read_csv('car_details.csv')
car_details_df

Unnamed: 0,name,year,selling_price,km_driven,fuel,seller_type,transmission,owner
0,Maruti 800 AC,2007,60000,70000,Petrol,Individual,Manual,First Owner
1,Maruti Wagon R LXI Minor,2007,135000,50000,Petrol,Individual,Manual,First Owner
2,Hyundai Verna 1.6 SX,2012,600000,100000,Diesel,Individual,Manual,First Owner
3,Datsun RediGO T Option,2017,250000,46000,Petrol,Individual,Manual,First Owner
4,Honda Amaze VX i-DTEC,2014,450000,141000,Diesel,Individual,Manual,Second Owner
...,...,...,...,...,...,...,...,...
4335,Hyundai i20 Magna 1.4 CRDi (Diesel),2014,409999,80000,Diesel,Individual,Manual,Second Owner
4336,Hyundai i20 Magna 1.4 CRDi,2014,409999,80000,Diesel,Individual,Manual,Second Owner
4337,Maruti 800 AC BSIII,2009,110000,83000,Petrol,Individual,Manual,Second Owner
4338,Hyundai Creta 1.6 CRDi SX Option,2016,865000,90000,Diesel,Individual,Manual,First Owner


In [27]:
weather_df = pd.read_csv('weather_data.csv', skiprows = 2)
weather_df

Unnamed: 0,day,temperature,windspeed,event
0,1/1/2017,32,6,Rain
1,1/4/2017,not available,9,Sunny
2,1/5/2017,-1,not measured,Snow
3,1/6/2017,not available,7,no event
4,1/7/2017,32,not measured,Rain
5,1/8/2017,not available,not measured,Sunny
6,1/9/2017,not available,not measured,no event
7,1/10/2017,34,8,Cloudy
8,1/11/2017,-4,-1,Snow
9,1/12/2017,26,12,Sunny


In [31]:
weather_df = pd.read_csv('weather_data.csv', header = 2)
weather_df

Unnamed: 0,day,temperature,windspeed,event
0,1/1/2017,32,6,Rain
1,1/4/2017,not available,9,Sunny
2,1/5/2017,-1,not measured,Snow
3,1/6/2017,not available,7,no event
4,1/7/2017,32,not measured,Rain
5,1/8/2017,not available,not measured,Sunny
6,1/9/2017,not available,not measured,no event
7,1/10/2017,34,8,Cloudy
8,1/11/2017,-4,-1,Snow
9,1/12/2017,26,12,Sunny


#### Reading a csv file without header and giving names to the columns

In [36]:
weather_df = pd.read_csv('weather_data.csv',skiprows = 3, 
                         header = None, names = ['day', 'temp', 'ws', 'event'])
weather_df

Unnamed: 0,day,temp,ws,event
0,1/1/2017,32,6,Rain
1,1/4/2017,not available,9,Sunny
2,1/5/2017,-1,not measured,Snow
3,1/6/2017,not available,7,no event
4,1/7/2017,32,not measured,Rain
5,1/8/2017,not available,not measured,Sunny
6,1/9/2017,not available,not measured,no event
7,1/10/2017,34,8,Cloudy
8,1/11/2017,-4,-1,Snow
9,1/12/2017,26,12,Sunny


#### Read limited data from a csv file using nrows parameters


In [37]:
weather_df = pd.read_csv('weather_data.csv',skiprows = 3, nrows = 5,
                         header = None, names = ['day', 'temp', 'ws', 'event'])
weather_df

Unnamed: 0,day,temp,ws,event
0,1/1/2017,32,6,Rain
1,1/4/2017,not available,9,Sunny
2,1/5/2017,-1,not measured,Snow
3,1/6/2017,not available,7,no event
4,1/7/2017,32,not measured,Rain


#### Reading csv files with na_values parameters ('weather_data.csv' file)


In [40]:
weather_df = pd.read_csv('weather_data.csv', skiprows = 2, 
                        na_values = ['not available', 'not measured', 'no event'])
weather_df

Unnamed: 0,day,temperature,windspeed,event
0,1/1/2017,32.0,6.0,Rain
1,1/4/2017,,9.0,Sunny
2,1/5/2017,-1.0,,Snow
3,1/6/2017,,7.0,
4,1/7/2017,32.0,,Rain
5,1/8/2017,,,Sunny
6,1/9/2017,,,
7,1/10/2017,34.0,8.0,Cloudy
8,1/11/2017,-4.0,-1.0,Snow
9,1/12/2017,26.0,12.0,Sunny


In [41]:
weather_df = pd.read_csv('weather_data.csv', skiprows = 2, 
                        na_values = {'temperature': ['not available'],
                                     'windspeed': ['not measured', -1],
                                     'event' : ['no event']})
weather_df

Unnamed: 0,day,temperature,windspeed,event
0,1/1/2017,32.0,6.0,Rain
1,1/4/2017,,9.0,Sunny
2,1/5/2017,-1.0,,Snow
3,1/6/2017,,7.0,
4,1/7/2017,32.0,,Rain
5,1/8/2017,,,Sunny
6,1/9/2017,,,
7,1/10/2017,34.0,8.0,Cloudy
8,1/11/2017,-4.0,,Snow
9,1/12/2017,26.0,12.0,Sunny


#### Write a pandas dataframe to a csv file
<li>We can write a pandas dataframe to a csv file using .to_csv() method.</li>
<li>You can specify any name to the csv file while writing a pandas dataframe into a csv file.</li>

In [43]:
weather_df.to_csv('weather_data_nan.csv', index = False)

### 6. Pandas Dataframe From Xcel files

<li>We can load an excel file with <b>.xlsx</b> extension and create a dataframe out of the data present inside an excel file using pandas.</li>
<li>We have <b>.read_excel()</b> method to read a csv file and create a pandas dataframe from the dataset.</li>
<li>We also need to install <b>openpyxl</b> for working with excel files.</li>

#### Writing to an excel file
<li>We can write a pandas dataframe into a excel file using .to_excel() method.</li>

#### Using head() and tail() method to see top 5 and last 5 rows
<li>To view the first few rows of our dataframe, we can use the DataFrame.head() method.</li>
<li>By default, it returns the first five rows of our dataframe.</li>
<li>However, it also accepts an optional integer parameter, which specifies the number of rows.</li>

<li>Similarly, to view the last few rows of our dataframe, we can use the DataFrame.tail() method.</li>
<li>By default, it returns the last five rows of our dataframe.</li>
<li>However, it also accepts an optional integer parameter, which specifies the number of rows.</li>

#### Finding the column names from the dataframe
<li>We have df.columns attributes to check the name of columns in the pandas dataframe.</li>
<li>Similarly, we have df.values attributes to check the data present in the pandas dataframe.</li>

#### Checking the type of your dataframe 
<li>Another feature that makes pandas better for working with data is that dataframes can contain more than one data type.</li>
<li>Axis values can have string labels, not just numeric ones.</li>
<li>Dataframes can contain columns with multiple data types: including integer, float, and string.</li>
<li>We can use the DataFrame.dtypes attribute (similar to NumPy) to return information about the types of each column.</li>
<li>When we import data, pandas attempts to guess the correct dtype for each column.</li>
<li>Generally, pandas does well with this, which means we don't need to worry about specifying dtypes every time we start to work with data.</li>



#### Datatypes Information
<li>We can get the shape of the dataset using <b>.shape()</b> method.</li>
<li><b>.shape()</b> method returns the tuple datatype containing the number of rows and number of columns in the dataset.</li>
<li>If we wanted an overview of all the dtypes used in our dataframe, we can use <b>.info()</b> method.</li>
<li>Note that <b>DataFrame.info()</b> prints the information, rather than returning it, so we can't assign it to a variable.</li>


#### Checking the null values in the pandas dataframe

#### set_index() and reset_index() method

#### Selecting a column from a pandas DataFrame

<li>Since our axis in pandas have labels, we can select data using those labels.</li> 
<li>Unlike in NumPy, we donot need to know the exact index location of a pandas dataframe.</li>
<li>To do this, we can use the DataFrame.loc[] attribute. The syntax for DataFrame.loc[] is:</li>
<code>
df.loc[row_label, column_label]
</code>

<li>We can use the following shortcut to select a single column:</li>
<code>
df["column_name"]
</code>

<li>This style of selecting columns is very common.</li>


#### Questions

<li>Read <b>'appointment_schedule.csv'</b> file using pandas.</li>
<li>Select the <b>'name'</b> column from the given dataset and store to <b>'appointment_names'</b> variable.</li>
<li>Use Python's <b>type()</b> function to assign the type of name column to <b>name_type</b>.</li>

#### Pandas Series
<li>Series is the pandas type for one-dimensional objects.</li>
<li>Anytime you see a 1D pandas object, it will be a series. Anytime you see a 2D pandas object, it will be a dataframe.</li>
<li>A dataframe is a collection of series objects, which is similar to how pandas stores the data behind the scenes.</li>

#### Adding a column in a pandas dataframe

### Selecting Multiple Columns From the DataFrame

![](images/selecting_columns.png)

<li>We can select multiple columns from the dataframe by using the following codes:</li>
<code>
    df.loc[:, ["col1", "col2"]]
</code>

<li>We can use syntax shortcuts for selecting multiple columns by using the following syntax:</li>
<code>
    df[["col1", "col2"]]
</code>

#### Question:
<li>Read 'car_details.csv' file and create a pandas dataframe from it.</li>
<li>Then only select <b>'name'</b>, <b>'selling price'</b> and <b>'km_driven'</b> columns from the dataframe.</li>

![](images/selecting_3_cols.png)

#### Selecting Rows From A Pandas DataFrame

<li>Now that we've learned how to select columns by label, let's learn how to select rows using the labels of the index axis.</li>
<li>We can use the same syntax to select rows from a dataframe as we do for columns:</li>
<code>
    df.loc[row_label, column_label]
</code>



### Selecting Multiple Rows From the DataFrame

![](images/selecting_multiple_rows.png)

#### Indexing & Slicing In Pandas DataFrame

<li>We can slice a dataset from their rows as well as columns.</li>
<li>If we have (5,5) shape data and we want first three rows and first three columns then we need to slice both rows and columns to get a desired shape.</li>
<li>We have df.iloc() method which we can use to do indexing as well as slicing in a dataframe.</li>
<li>Let's practice .iloc() method.</li>


#### Datatype Conversion In Pandas

<li>Pandas astype() is the one of the most important methods. It is used to change data type of a series.</li>
<li>When a pandas dataframe is created from a csv file,the data type is set automatically.</li>
<li>The datatype will not be what it actually should be at times and this is where we can use astype()  to get desired datatype.</li>
<li>For example, a salary column could be imported as string but to do operations we have to convert it into float.</li>
<li>astype() is used to do such data type conversions.</li>

#### Value Counts Method

<li>Since series and dataframes are two distinct objects, they have their own unique methods.</li>

<li>Let's look at an example of a series method - the Series.value_counts() method.</li>

<li>This method displays each unique non-null value in a column and their counts in order.</li>

<li>value_counts() is a series only method, we get the following error if we try to use it for dataframes:</li>

<code>
    AttributeError: 'DataFrame' object has no attribute 'value_counts'
</code>

#### Selecting Items From A Series Method

<li>As with dataframes, we can use Series.loc[] to select items from a series using single labels, a list, or a slice object.</li>
<li>We can also omit loc[] and use bracket shortcuts for all three:</li>

![](images/selecting_series.png)

#### Question

<li>Use the value counts method to check the frequency count of different names from 'appointment_schedule.csv' file.</li>
<li>Select only first row from the series.</li>
<li>Select the first row and the last row from the series.</li>
<li>Select the first five rows and the last five rows from the series.</li>



#### DataFrame Vs DataSeries

![](images/dataframe_vs_series.png)

#### Summary

![](images/pandas_selection_summary.png)

#### Vecotrized Operations In Pandas

<li>We'll explore how pandas uses many of the concepts we learned in the NumPy.</li>
<li>Because pandas is designed to operate like NumPy, a lot of concepts and methods from Numpy are supported.</li>
<li>Recall that one of the ways NumPy makes working with data easier is with vectorized operations.</li>
<li>Just like with NumPy, we can use any of the standard Python numeric operators with series, including:</li>
<code>
    series_a + series_b - Addition
    series_a - series_b - Subtraction
    series_a * series_b - Multiplication
    series_a / series_b - Division
</code>

#### Some Statistical Functions In Pandas

<li>Like NumPy, Pandas supports many descriptive stats methods such as mean, median, mode, min, max and so on.</li>
<li>Here are a few of the most useful ones.</li>
<code>
Series.max()
Series.min()
Series.mean()
Series.median()
Series.mode()
Series.sum()
</code>
<li>We can calculate the average value of a particular column(series) using df.column_name.mean().</li>
<li>For calculating the minimum value in a particular column(series), we can use df.column_name.min().</li>
<li>Similarly, for calculating the maximum value in a particular column(series), we can use df.column_name.max().</li>

#### Finding the descriptive statistics of the dataframe using .describe() method

<li>Descriptive statistics include those that summarize the central tendency, dispersion and shape of a dataset's distribution, excluding NaN values.</li>
<li>describe() method in Pandas is used to compute descriptive statistics for all of your numeric columns.</li>
<li>Analyzes both numeric and object series, as well as DataFrame column sets of mixed data types.</li>
<li>The output will vary depending on what is provided.</li>
<li>If we want to see the descriptive statistics of an object datatype then we have to specify <b>df.describe(include = "O")</b></li>

#### Assigning Values With Pandas

<li>Just like in NumPy, the same techniques that we use to select data could be used for assignment.</li>

<li>When we selected a whole column by label and used assignment, we assigned the value to every item in that column.</li>

<li>By providing labels for both axes, we can assign them to a single value within our dataframe.</li>

<code>
    df.loc[row_label, col_label] = assignment_value
</code>

#### Using Boolean Indexing With Pandas Objects (Selection With Condition In Pandas)
<li>We can assign a value by using row label and column label in pandas.</li>
<li>But what if we need to assign a same value to a group of similar rows with the same criteria.</li>
<li> Instead, we can use boolean indexing to change all rows that meet the same criteria, just like we did with NumPy.</li>


<ol>
    <li>Equals: df['series'] == value</li>
    <li>Not Equals: df['series'] != value</li>
    <li>Less than: df['series'] < value</li>
    <li>Less than or equal to: df['series'] <= value</li>
    <li>Greater than: df['series'] > value</li>
    <li>Greater than or equal to: df['series'] >= value</li>
</ol>
<li>These conditions can be used in several ways, most commonly inside .loc to select values with conditions.</li>

### Using Pandas Method To Create a Boolean Mask

<li>In the last couple lessons, we used Python boolean operators to create boolean masks to select subsets of data.</li>
    
<li>There are also a number of pandas methods that return boolean masks useful for exploring data.</li>

<li>Two examples are the Series.isnull() method and Series.notnull() method.</li>
<li>Series.isnull() method can be used to select either rows that contain null (or NaN) values for a certain column.</li>
<li>Similarly, Series.notnull() method is used to select rows that do not contain null values for a certain column.</li>

#### Sorting Values
<li>We can use the DataFrame.sort_values() method to sort the rows on a particular column.</li>
<li>To do so, we pass the column name to the method:</li>
<code>
sorted_rows = df.sort_values("column_name")
</code>
<li>By default, the sort_values() method will sort the rows in ascending order — from smallest to largest.</li>
<li>To sort the rows in descending order instead, we can set the ascending parameter to False:</li>
<code>
    sorted_rows = df.sort_values("column_name", ascending=False)
</code>


### String Manipulation In Pandas DataFrame

<li>String manipulation is the process of changing, parsing, splitting, 'cleaning' or analyzing strings.</li>
<li>As we know that sometimes, data in the string is not suitable for manipulating the analysis or get a description of the data.</li>
<li>But Python is known for its ability to manipulate strings.</li>
<li>Pandas provides us the ways to manipulate to modify and process string data-frame using some builtin functions.</li>
<li>Some of the most useful pandas string processing functions are as follows:</li>
<ol>
    <li><b>lower()</b></li>
    <li><b>upper()</b></li>
    <li><b>strip()</b></li>
    <li><b>split()</b></li>
    <li><b>get_dummies()</b></li>
    <li><b>startswith()</b></li>
    <li><b>endswith()</b></li>
    <li><b>replace()</b></li>
    <li><b>contains()</b></li>
</ol>


#### 1. lower(): 
<li>It converts all uppercase characters in strings in the dataframe to lower case and returns the lowercase strings in the result.</li>


#### 2. upper():
<li>It converts all lowercase characters in strings in the dataframe to upper case and returns the uppercase strings in result.</li>


#### 3. strip():
<li>If there are spaces at the beginning or end of a string, we should trim the strings to eliminate spaces using strip() method.</li>
<li>It remove the extra spaces contained by a string in a DataFrame.</li>


#### 4. split(‘ ‘):
<li>It splits each string with the given pattern.</li>
<li>Strings are split and the new elements after the performed split operation, are stored in a list.</li>


#### 5. get_dummies(): 
<li>It returns the DataFrame with One-Hot Encoded values like we can see that it returns boolean value 1 if it exists in relative index or 0 if not exists.</li>


#### 6. startswith(pattern):
<li>It returns true if the element or string in the DataFrame Index starts with the pattern.</li>
<li>If you wanted to filter out rows that startswith 'ind' then you can specify df[df[col].str.startswith('ind')</li>


#### 7. endswith(pattern):
<li>It returns true if the element or string in the DataFrame Index ends with the pattern.</li>
<li>If you wanted to filter out rows that ends with 'es' then you can specify df[df[col].str.endswith('es')</li>


#### 8. replace(a,b):
<li>It replaces the value a with the value b.</li>
<li>If you wanted to remove white space characters then you can use replace() method as:</li>
<code>
df[col_name].str.replace(" ", "")
</code>


#### 9. contains():
<li>contains() method checks whether the string contains a particular substring or not.</li>
<li>The function is quite similar to replace() but instead of replacing the string itself it just returns the boolean value True or False.</li>
<li>If a substring is present in a string, then it returns boolean value True else False.</li>



#### Handling Missing Values
<li>We can use fillna() method in pandas to fill missing values using different ways.</li>
<li>We can use dropna() method to drop rows with missing values.</li>
<li>We can also fill missing values with the mean value, median value or the mode value depending on the values of columns.</li>
<li>Filling missing values with mean is appropriate when the column has continuous values.</li>
<li>If the data is categorical then filling missing values with median and mode is a good idea.</li>