# Pandas exercises


Pandas is a python library that makes it easy to manipulate, analyse, clean and explore data.
The name "Pandas" refers to both "Panel Data" and "Python Data Analysis". 

## Exercise 1 - List-to-Series Conversion
Given a list, output the corresponding pandas series

In [1]:
import pandas as pd

a = [1, 2, 3]
b = pd.Series(a)
print(b)

0    1
1    2
2    3
dtype: int64


## Exercise 2 - List-to-Series Conversion with Custom Indexing
Given a series, output the corresponding pandas series with odd indexes only

In [2]:
a = [1, 2, 3, 4, 5]

b = pd.Series(a, index = [1, 3, 5, 7, 9])

print(b)

1    1
3    2
5    3
7    4
9    5
dtype: int64


## Exercise 3 - Date Series Generation
Generate the series of dates from 1st May, 2021 to 12th May, 2021 (both inclusive)

In [8]:
date_series = pd.date_range(start = '2021-05-01', end = '2021-05-12')

print(date_series)

DatetimeIndex(['2021-05-01', '2021-05-02', '2021-05-03', '2021-05-04',
               '2021-05-05', '2021-05-06', '2021-05-07', '2021-05-08',
               '2021-05-09', '2021-05-10', '2021-05-11', '2021-05-12'],
              dtype='datetime64[ns]', freq='D')


## Exercise 3 - Dictionary-to-Dataframe Conversion
Given a dictionary, convert it into corresponding dataframe and display it


dictionary = {'name': ['Vinay', 'Kushal', 'Aman'],
              'age' : [22, 25, 24],
              'occ' : ['engineer', 'doctor', 'accountant']}

In [7]:
dictionary = {'name': ['Vinay', 'Kushal', 'Aman'], 'age' : [22, 25, 24], 'occ' : ['engineer', 'doctor', 'accountant']}
dataframe = pd.DataFrame(dictionary)

print(dataframe)

     name  age         occ
0   Vinay   22    engineer
1  Kushal   25      doctor
2    Aman   24  accountant


## Exercise 4- Setting Custom Index in Dataframe
Given a dataframe, change the index of a dataframe from the default indexes to a particular column.

Please use the dataframe generated in Exercise 3

In [13]:
dataframe = pd.read_csv('/content/sample_data/california_housing_test.csv')

dataframe_customindex = dataframe.set_index('housing_median_age')

print(dataframe_customindex)

                    longitude  latitude  total_rooms  total_bedrooms  \
housing_median_age                                                     
27.0                  -122.05     37.37       3885.0           661.0   
43.0                  -118.30     34.26       1510.0           310.0   
27.0                  -117.81     33.78       3589.0           507.0   
28.0                  -118.36     33.82         67.0            15.0   
19.0                  -119.67     36.33       1241.0           244.0   
...                       ...       ...          ...             ...   
23.0                  -119.86     34.42       1450.0           642.0   
27.0                  -118.14     34.06       5257.0          1082.0   
10.0                  -119.70     36.30        956.0           201.0   
40.0                  -117.12     34.10         96.0            14.0   
42.0                  -119.63     34.42       1765.0           263.0   

                    population  households  median_income  medi

#Exercise 5 - Sorting a Dataframe by Multiple Columns
Use the dataframe generated in Exercise 3, and sort it by multiple columns: 'id' and 'age'

In [14]:
dataframe_sorted = dataframe.sort_index()

print(dataframe_sorted)

      longitude  latitude  housing_median_age  total_rooms  total_bedrooms  \
0       -122.05     37.37                27.0       3885.0           661.0   
1       -118.30     34.26                43.0       1510.0           310.0   
2       -117.81     33.78                27.0       3589.0           507.0   
3       -118.36     33.82                28.0         67.0            15.0   
4       -119.67     36.33                19.0       1241.0           244.0   
...         ...       ...                 ...          ...             ...   
2995    -119.86     34.42                23.0       1450.0           642.0   
2996    -118.14     34.06                27.0       5257.0          1082.0   
2997    -119.70     36.30                10.0        956.0           201.0   
2998    -117.12     34.10                40.0         96.0            14.0   
2999    -119.63     34.42                42.0       1765.0           263.0   

      population  households  median_income  median_house_value

##Exercise 6 - Conditional Selection of Rows in a DataFrame
Use the dataframe generated in Exercise 3, and select rows based on a condition : age > 24

In [17]:
dataframe_condition = dataframe.loc[dataframe.housing_median_age >= 40]

print(dataframe_condition)

      longitude  latitude  housing_median_age  total_rooms  total_bedrooms  \
1       -118.30     34.26                43.0       1510.0           310.0   
6       -121.43     38.63                43.0       1009.0           225.0   
10      -118.24     33.98                45.0        972.0           249.0   
15      -117.99     33.81                42.0        161.0            40.0   
20      -122.15     37.75                40.0       1445.0           256.0   
...         ...       ...                 ...          ...             ...   
2988    -122.01     36.97                43.0       2162.0           509.0   
2990    -118.23     34.09                49.0       1638.0           456.0   
2992    -122.33     37.39                52.0        573.0           102.0   
2998    -117.12     34.10                40.0         96.0            14.0   
2999    -119.63     34.42                42.0       1765.0           263.0   

      population  households  median_income  median_house_value

## Exercise 7:
Pandas can read CSV, JSON, Excel, etc. files. Follow the following tutorial to learn more about the supported file formats (you will want to read all your data sources with Pandas afterwards):

https://pandas.pydata.org/docs/getting_started/intro_tutorials/02_read_write.html#min-tut-02-read-write

## Exercise 8 : Exploration of the salaries dataframe (optional)
Please use salaries.csv fourni, try to answer the following questions:

-1) What is the type of the salary column?

-2) What is the type of all columns?

-3) Select the salary column and display the highest value.

-4) Select the first 20 rows of the dataframe

-5) Display the last 2 rows of the dataframe

-6) Give the summary for the numeric columns in the dataset

-7) Calculate the standard deviation for each numeric column

-8) What is the average salary?

-9) What is the most answered grade in the dataframe?

-10) Structure your code to have two functions: One for reading and What is the least answered grade in the dataframe?

Documentation Pandas : https://pandas.pydata.org/docs/getting_started/index.html