<img src="images/pandas-intro.png">

# Learning Agenda of this Notebook:
- What is Pandas and how is it used in AI?
- Key features of Pandas
- Data Types in Pandas
- What does Pandas deal with?

- Creating Series in Pandas
    - From Python List
    - From NumPy Arrays
    - From Python Dictionary
    - From a scalar value
    - Creating empty series object
- Attributes of a Pandas Series
- Arithmetic Operations on Series

- Dataframes in Pandas
    - Anatomy of a Dataframe
    - Creating Dataframe
        - An empty dataframe
        - Two-Dimensional NumPy Array
        - Dictionary of Python Lists
        - Dictionary of Panda Series
    - Attributes of a Dataframe
    - Bonus
- Different file formats in Pandas 
- Indexing, Subsetting and Slicing Dataframes
    - Practice Exercise I
- Modifying Dataframes
- Data Handling with Pandas
  - Practice Exercise I
  - Practice Exercise II
- All Statistical functions in Pandas
- Input/Output Operations
- Aggregation & Grouping
  - Practice Exercise
- Merging, Joining and Concatenation
  - Practice Exercise
- How To Perform Data Visualization with Pandas
- Exercise I
- Exercise II
- Pandas's Assignment

### Our Main Problem :
Here is dataset and find minimum temperature of each city in the dataset

In [2]:
import pandas as pd
df = pd.read_csv('datasets/groupbydata2.csv')
df

Unnamed: 0,date,city,temperature,humidity
0,01/01/2022,lahore,8,60
1,02/01/2022,lahore,10,58
2,03/01/2022,lahore,5,51
3,04/01/2022,lahore,6,49
4,05/01/2022,lahore,12,54
5,01/01/2022,karachi,18,74
6,02/01/2022,karachi,10,71
7,03/01/2022,karachi,12,78
8,04/01/2022,karachi,15,76
9,05/01/2022,karachi,16,70


In [3]:
df.city.unique() # or df['city'].unique()

array(['lahore', 'karachi', 'murree'], dtype=object)

In [4]:
# find minimum temperature for each city
print(df[df['city'] == 'lahore']['temperature'].min())
print(df[df['city'] == 'karachi']['temperature'].min())
print(df[df['city'] == 'murree']['temperature'].min())

5
10
-7


In [5]:
# solution
df.groupby('city')['temperature'].min()

city
karachi    10
lahore      5
murree     -7
Name: temperature, dtype: int64

## Learning agenda of this notebook
1. Overview of Aggregation Functions and the `agg()` method
    - Applying a Built-in Aggregation Function on Entire Dataframe Object
    - Applying a Built-in Aggregation Function on a Series Object
    - Applying a User-Defined/Lambda Function on a Series Object<br><br>
2. Computing the Minimum Temperature of each City using **hard way**<br><br>
3. Computing the Minimum Temperature of each City using **`groupby`**<br><br>
4. Practice GroupBy on Stack Overflow Survey Dataset

## 1. Overview of Aggregation Functions and the `agg()` Method
- An aggregation function is one which takes multiple individual values and returns a result.

In [6]:
import pandas as pd
df = pd.read_csv('datasets/groupbydata2.csv')
df

Unnamed: 0,date,city,temperature,humidity
0,01/01/2022,lahore,8,60
1,02/01/2022,lahore,10,58
2,03/01/2022,lahore,5,51
3,04/01/2022,lahore,6,49
4,05/01/2022,lahore,12,54
5,01/01/2022,karachi,18,74
6,02/01/2022,karachi,10,71
7,03/01/2022,karachi,12,78
8,04/01/2022,karachi,15,76
9,05/01/2022,karachi,16,70


### a. Applying a Built-in Aggregation Function on Entire Dataframe Object

In [7]:
df.min(numeric_only=True)

temperature    -7
humidity       49
dtype: int64

In [8]:
df.count()

date           15
city           15
temperature    15
humidity       15
dtype: int64

In [9]:
# Should be applied to numeric columns only, may raise a warning
df.median()

  df.median()


temperature     8.0
humidity       68.0
dtype: float64

In [10]:
df.median(numeric_only=True)

temperature     8.0
humidity       68.0
dtype: float64

> We can call the `agg()` method on the dataframe to apply multiple aggregation functions at a time, by passing the `agg()` function a list of aggregation functions as strings.

In [13]:
df.agg(['min', 'max',  'count'])

Unnamed: 0,date,city,temperature,humidity
min,01/01/2022,karachi,-7,49
max,05/01/2022,murree,18,78
count,15,15,15,15


In [16]:
df[df.city=='lahore'].agg({'temperature':['min', 'max',  'count']})

Unnamed: 0,temperature
min,5
max,12
count,5


In [18]:
df[df.city=='lahore'].agg({'temperature':['min', 'max',  'count'],
       'humidity':['max', 'min','mean']})

Unnamed: 0,temperature,humidity
min,5.0,49.0
max,12.0,60.0
count,5.0,
mean,,54.4


> We can call the `describe()` method on the dataframe to get descriptive statistical measures on all its numeric columns.

In [19]:
df.describe()

Unnamed: 0,temperature,humidity
count,15.0,15.0
mean,6.133333,64.933333
std,8.253715,9.153194
min,-7.0,49.0
25%,-2.0,59.0
50%,8.0,68.0
75%,12.0,71.5
max,18.0,78.0


### b. Applying a Built-in Aggregation Function on a Series Object

In [20]:
df['temperature'].min()

-7

In [21]:
df['temperature'].max()

18

In [22]:
df['temperature'].mean()

6.133333333333334

> We can call the `agg()` method on a series to apply multiple aggregation functions at a time, by passing the `agg()` function a list of aggregation functions as strings.

In [23]:
df['temperature'].agg(['min', 'max', 'mean', 'count'])

min      -7.000000
max      18.000000
mean      6.133333
count    15.000000
Name: temperature, dtype: float64

> We can call the `describe()` method on the dataframe to get descriptive statistical measures on all its numeric columns.

In [24]:
df['temperature'].describe()

count    15.000000
mean      6.133333
std       8.253715
min      -7.000000
25%      -2.000000
50%       8.000000
75%      12.000000
max      18.000000
Name: temperature, dtype: float64

### c. Applying a User-Defined/Lambda Function on a Series Object using the `apply()` Method
- We have used this `apply()` method before as well that is used to invoke function on values of Series and return a resulting series.

In [25]:
df.temperature

0      8
1     10
2      5
3      6
4     12
5     18
6     10
7     12
8     15
9     16
10    -5
11    -3
12    -4
13    -1
14    -7
Name: temperature, dtype: int64

In [26]:
def ctof(x):
    return x*9/5+32

In [27]:
df.temperature.apply(ctof)

0     46.4
1     50.0
2     41.0
3     42.8
4     53.6
5     64.4
6     50.0
7     53.6
8     59.0
9     60.8
10    23.0
11    26.6
12    24.8
13    30.2
14    19.4
Name: temperature, dtype: float64

In [28]:
df.temperature.apply(lambda x: x*9/5+32)

0     46.4
1     50.0
2     41.0
3     42.8
4     53.6
5     64.4
6     50.0
7     53.6
8     59.0
9     60.8
10    23.0
11    26.6
12    24.8
13    30.2
14    19.4
Name: temperature, dtype: float64

# How to Compute the Minimum Temperature of Each City?

## 2. Doing it the Hard Way
<img align="center" width="700" height="500"  src="images/groupbyfinal.png"  >

In [29]:
import pandas as pd
df = pd.read_csv('datasets/groupbydata1.csv')
df

Unnamed: 0,date,city,temperature
0,01/01/2022,lahore,8
1,02/01/2022,lahore,10
2,03/01/2022,lahore,5
3,04/01/2022,lahore,6
4,05/01/2022,lahore,12
5,01/01/2022,karachi,18
6,02/01/2022,karachi,10
7,03/01/2022,karachi,12
8,04/01/2022,karachi,15
9,05/01/2022,karachi,16


### a. Splitting the Dataframe
- We need to use conditional selection technique, in which we pass a Boolean mask for the appropriate city column to be selected. Can do it using two ways:
    - Using `df[]` subscript operator
    - Using `df.loc` method

In [30]:
df[df['city']=='karachi']

Unnamed: 0,date,city,temperature
5,01/01/2022,karachi,18
6,02/01/2022,karachi,10
7,03/01/2022,karachi,12
8,04/01/2022,karachi,15
9,05/01/2022,karachi,16


In [31]:
df[df['city']=='lahore']
df.loc[df.city=='lahore', :]

Unnamed: 0,date,city,temperature
0,01/01/2022,lahore,8
1,02/01/2022,lahore,10
2,03/01/2022,lahore,5
3,04/01/2022,lahore,6
4,05/01/2022,lahore,12


In [32]:
df[df['city']=='karachi']
df.loc[df.city=='karachi', :]

Unnamed: 0,date,city,temperature
5,01/01/2022,karachi,18
6,02/01/2022,karachi,10
7,03/01/2022,karachi,12
8,04/01/2022,karachi,15
9,05/01/2022,karachi,16


In [33]:
df[df['city']=='murree']
df.loc[df.city=='murree', :]

Unnamed: 0,date,city,temperature
10,01/01/2022,murree,-5
11,02/01/2022,murree,-3
12,03/01/2022,murree,-4
13,04/01/2022,murree,-1
14,05/01/2022,murree,-7


>**Limitation:**
>- We have to repeat this process for every city separately.
>- What if there are over 100 cities in the dataset?

### b. Applying the `min()` Function
- We need to apply the `min()` function on the temperature column of all of the above dataframes separately

In [34]:
df.loc[df.city=='lahore', :].temperature.min()

5

In [35]:
df.loc[df.city=='lahore', :].temperature.min()

5

In [36]:
df.loc[df.city=='karachi', :].temperature.min()

10

In [37]:
df.loc[df.city=='murree', :].temperature.min()

-7

>**Limitation:**
>- We have to repeat this process for every city separately.
>- What if there are over 100 cities in the dataset?

### c. Combining the Result
- Since, we have got the minimum temperature of all the cities, we need to combine them to an appropriate series object to be used for later processing.

In [38]:
lhr = df.loc[df.city=='lahore', :].temperature.min()
kci = df.loc[df.city=='karachi', :].temperature.min()
murree = df.loc[df.city=='murree', :].temperature.min()

s = pd.Series(data=[lhr, kci, murree], index=['L_min', 'K_min', 'M_min'] )
s.name= 'Min Temperatures'
s

L_min     5
K_min    10
M_min    -7
Name: Min Temperatures, dtype: int64

# How to Compute the Minimum Temperature of Each City?

## 3. An Elegant Way
<img align="center" width="700" height="500"  src="images/groupbyfinal.png"  >

In [39]:
import pandas as pd
df = pd.read_csv('datasets/groupbydata1.csv')
df

Unnamed: 0,date,city,temperature
0,01/01/2022,lahore,8
1,02/01/2022,lahore,10
2,03/01/2022,lahore,5
3,04/01/2022,lahore,6
4,05/01/2022,lahore,12
5,01/01/2022,karachi,18
6,02/01/2022,karachi,10
7,03/01/2022,karachi,12
8,04/01/2022,karachi,15
9,05/01/2022,karachi,16


### a. Step 1: Split Step
- In the split step we divide the data inside the dataframe into multiple groups
- Since we need to calculate the minimum temperature of each city, therefore, we will use `groupby()` method on the `city` column of the dataframe.
- This will result a DataFrameGroupBy object, which is an iterable containing multiple small dataframes based on the `by` argument passed to the `groupby()` method

In [40]:
dfgb = df.groupby('city')
dfgb

<pandas.core.groupby.generic.DataFrameGroupBy object at 0x7f4eaf3bb6d0>

>- Since this is an iterable, so let us iterate :)

In [41]:
for mydf in dfgb:
    print(mydf)

('karachi',          date     city  temperature
5  01/01/2022  karachi           18
6  02/01/2022  karachi           10
7  03/01/2022  karachi           12
8  04/01/2022  karachi           15
9  05/01/2022  karachi           16)
('lahore',          date    city  temperature
0  01/01/2022  lahore            8
1  02/01/2022  lahore           10
2  03/01/2022  lahore            5
3  04/01/2022  lahore            6
4  05/01/2022  lahore           12)
('murree',           date    city  temperature
10  01/01/2022  murree           -5
11  02/01/2022  murree           -3
12  03/01/2022  murree           -4
13  04/01/2022  murree           -1
14  05/01/2022  murree           -7)


>- To display indices of every group in the dataframe, use `groups` attribute of  `DataFrameGroupBy` object.
>- Returns a Dictionary object (PrettyDict) with keys as the group value and value as list of corresponding row indices

In [42]:
dfgb.groups   # df.groupby('city').groups

{'karachi': [5, 6, 7, 8, 9], 'lahore': [0, 1, 2, 3, 4], 'murree': [10, 11, 12, 13, 14]}

>- To display records of a specific group, use `get_group()` method on `DataFrameGroupBy` object.
>- Construct and return DataFrame from `DataFrameGroupBy` object  with provided name.

In [43]:
# Display DataFrame of a specific group from groupby object by providing the specific group value
dfgb.get_group('murree') # df.groupby('city').get_group('karachi') 

Unnamed: 0,date,city,temperature
10,01/01/2022,murree,-5
11,02/01/2022,murree,-3
12,03/01/2022,murree,-4
13,04/01/2022,murree,-1
14,05/01/2022,murree,-7


>- To find the size of each group, use `size()` method of DataFrameGroupBy object.
>- It return a series containing number of rows in each each group of the DataFrameGroupBy object as a Series

In [44]:
dfgb.size()  #df.groupby('city').size()

city
karachi    5
lahore     5
murree     5
dtype: int64

> After understanding the `groupby()` method let us move to step 2, and that is `Applying a Function`

### b. Step 2: Apply Step
- Now second step is that we apply appropriate aggregate function on all the groups inside the DataFrameGroupBy object

**Let us first apply aggregate function on a specific column of `DataFrameGroupBy` object, which is a `SeriesGroupBy` object**

In [None]:
df

In [45]:
df.groupby('city')

<pandas.core.groupby.generic.DataFrameGroupBy object at 0x7f4eaf3ac4c0>

In [46]:
df.groupby('city').get_group('lahore')

Unnamed: 0,date,city,temperature
0,01/01/2022,lahore,8
1,02/01/2022,lahore,10
2,03/01/2022,lahore,5
3,04/01/2022,lahore,6
4,05/01/2022,lahore,12


In [47]:
df.groupby('city').get_group('lahore').temperature

0     8
1    10
2     5
3     6
4    12
Name: temperature, dtype: int64

In [48]:
df.groupby('city').get_group('lahore').temperature.min()

5

In [49]:
df.groupby('city').get_group('karachi').temperature.min()

10

In [50]:
df.groupby('city').get_group('murree').temperature.min()

-7

### b. Step 3: Combine Step
- Now we have got minimum temperature of all the three cities, let us combine the result into a series object

In [51]:
kci = df.groupby('city').get_group('karachi').temperature.min()
lhr = df.groupby('city').get_group('lahore').temperature.min()
murree = df.groupby('city').get_group('murree').temperature.min()

s1 = pd.Series(data=[kci, lhr, murree], index=['K_min', 'L_min', 'M_min'] )
s1.name= 'Min Temperatures'
s1

K_min    10
L_min     5
M_min    -7
Name: Min Temperatures, dtype: int64

>- **Let us perform the `apply + combine` steps in one go, by applying the `min()` function on the temperature series of all the dataframes inside the DataFrameGroupBy object.**
>- **This saves us from the hassle of applying `min()` method explicitly as done above**

In [52]:
df.groupby('city')

<pandas.core.groupby.generic.DataFrameGroupBy object at 0x7f4eaf3cceb0>

In [53]:
df.groupby('city').temperature

<pandas.core.groupby.generic.SeriesGroupBy object at 0x7f4eaf3cce20>

In [54]:
df.groupby('city').temperature.min()

city
karachi    10
lahore      5
murree     -7
Name: temperature, dtype: int64

In [None]:
df.columns

In [None]:
# df.groupby('city')['temperature','humidity'].agg(['count','max','min','mean'])

>- **We can also apply `agg()` method on the temperature series of all the dataframes inside the DataFrameGroupBy object**

In [55]:
df.groupby('city').temperature.agg(['min', 'max', 'sum', 'mean'])

Unnamed: 0_level_0,min,max,sum,mean
city,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
karachi,10,18,71,14.2
lahore,5,12,41,8.2
murree,-7,-1,-20,-4.0


>Note that we have got a dataframe this time.

## 4. Practice GroupBy on Stack Overflow Survey Dataset
Visit to Download Data: https://insights.stackoverflow.com/survey/

### a. Understand the Data Set

In [None]:
import pandas as pd
df = pd.read_csv('datasets/so_survey_subset.csv', index_col='Respondent')
df.shape

In [None]:
df.head()

In [None]:
df.columns

In [None]:
df.info()

In [None]:
# df.Country
# First method
df[df.Country =='Pakistan']

In [None]:
df.loc[df['Country']=='Pakistan', :]

In [None]:
import pandas as pd
schema = pd.read_csv('datasets/so_survey_subset_schema.csv', index_col='Column')
schema

In [None]:
schema.loc['Hobbyist']

In [None]:
df['Hobbyist']

In [None]:
schema.loc['Country']

In [None]:
df['Country']

In [None]:
schema.loc['ConvertedComp']

In [None]:
df['ConvertedComp']

In [None]:
schema.loc['LanguageWorkedWith']

In [None]:
!cat datasets/so_survey_subset_schema.csv

In [None]:
df['LanguageWorkedWith']

In [None]:
schema.loc['SocialMedia']

In [None]:
df['SocialMedia']

In [None]:
df

##### Let us perform some basic statistical analysis on the Dataset

In [None]:
# Returns the count of non-NA values for a series object.
df['Hobbyist'].count()

In [None]:
# Returns a Series containing counts of unique rows.
df['Hobbyist'].value_counts()

In [None]:
# Returns the count of non-NA values for a series object.
df['Country'].count()

In [None]:
# Returns a Series containing counts of unique rows.
df['Country'].value_counts()

### To get the count of countries whose developers participated in the survey

In [None]:
df['Country'].value_counts().count()

In [None]:
# Returns the count of non-NA values for a series object.
df['ConvertedComp'].count()

In [None]:
# Returns a Series containing counts of unique rows.
df['ConvertedComp'].value_counts()

In [None]:
df['ConvertedComp'].mean()

In [None]:
df['ConvertedComp'].median()

In [None]:
df.describe()

<h1 align="center">Let us try answering certain Questions</h1>

##  Question 1: 
>**List the most popular SocialMedia web site for every Country**

**Let us first  do the easy task, and that is to list the most popular SocialMedia website of a single country (lets say Pakistan)**

In [None]:
df[df.Country=='Pakistan']

In [None]:
df.loc[df.Country=='Pakistan', 'SocialMedia']

In [None]:
df.loc[df.Country=='Pakistan', 'SocialMedia'].value_counts()

In [None]:
df.columns

In [None]:
df.groupby('Country').get_group('Pakistan').loc[:,'SocialMedia'].value_counts()

In [None]:
df.loc[df.Country =='Pakistan', :]
df.loc[df.Country =='Pakistan', 'SocialMedia'].head(10)
df.loc[df.Country =='Pakistan', 'SocialMedia'].value_counts()
df.loc[df.Country =='Pakistan', 'SocialMedia'].value_counts(normalize=True)
df.loc[df.Country =='Germany', 'SocialMedia'].value_counts()

In [None]:
df.groupby('Country')

In [None]:
df.groupby('Country').get_group("Pakistan").head()

In [None]:
df.groupby('Country').get_group("Pakistan").loc[:, 'SocialMedia']

In [None]:
df.groupby('Country').get_group("Pakistan").loc[:, 'SocialMedia'].value_counts()

In [None]:
df.groupby('Country')['SocialMedia'].value_counts().head(60)

In [None]:
df.groupby('Country')['SocialMedia'].value_counts().head(50)

In [None]:
df.groupby('Country')['SocialMedia'].value_counts().head(30)

##  Question 2: 
>**What percentage of people in each country knows Python programming?**

**tc** = Total count of people from each country who participated in the survey?

**pc** = Python People: Count of people from each country who knows Python

**tc (option 1):**

In [None]:
df

In [None]:
df.loc[:, 'Country']

In [None]:
tc = df['Country'].value_counts()
tc.name = 'Total'
tc

**tc (option 2):**

In [None]:
dfgb = df.groupby('Country')
dfgb

In [None]:
df.groupby('Country')['Country']

In [None]:
df.groupby('Country')['Country'].apply(lambda x: x.value_counts()).sort_values()

**pc:**

In [None]:
df.loc[:, 'LanguageWorkedWith']

In [None]:
df.groupby('Country')['LanguageWorkedWith'].apply(lambda x:x.str.contains('Python').sum())

In [None]:
df.groupby('Country')['LanguageWorkedWith'].apply(lambda x:x.str.contains('Python').sum())

In [None]:
df.groupby('Country')['LanguageWorkedWith'].apply(lambda x: x.str.contains('Python'))

In [None]:
pp = df.groupby('Country')['LanguageWorkedWith'].apply(lambda x: x.str.contains('Python').sum())
pp

In [None]:
pp.name = 'Knows Python'

**Create a Dataframe of two series tc and pp**

In [None]:
resultdf = pd.concat([tc, pp], axis=1)
resultdf

In [None]:
resultdf.loc['Pakistan']

In [None]:
resultdf.loc['India']

In [None]:
resultdf['Knows Python']/resultdf['Total']*100

**Percentage of people in each country knows Python?**

In [None]:
resultdf['Percentage'] = (resultdf['Knows Python'] / resultdf['Total']) * 100
resultdf

In [None]:
resultdf.loc['Pakistan']

In [None]:
resultdf.sample(20).sort_values(by ='Percentage', ascending=False)

In [None]:
resultdf.sort_values(by='Percentage', ascending=False).head(20)

## Let's create your own five questions and try to answer these questions.

In [None]:
# Q-01 ==???
# Q-02 ==???
# Q-03 ==???
# Q-04 ==???
# Q-05 ==???

In [None]:
# columns of the dataframe
df.columns

#### Question 03 : Average salary of each gender in each country. Analyze and Visualize

In [None]:
df.Gender.value_counts()

In [None]:
# clean gender column
def clean_gender(x):
    if x=='Man':
        return 'Men'
    elif x=='Woman':
        return 'Woman'
    else:
        return 'other'
df.Gender = df.Gender.apply(clean_gender)
df.head()

In [None]:
# rename ConvertedComp as salary column
df.rename(columns={'ConvertedComp':'salary'}, inplace=True)

In [None]:
df[df.Country =='Pakistan'].groupby('Gender')['salary'].agg(['mean','max','min'])

In [None]:
df.groupby(['Country','Gender'])['salary'].agg(['mean','max','min'])

In [None]:
new_df = df.groupby(['Country','Gender'])['salary'].agg(['mean','max','min']).reset_index().sort_values(by=['max'], ascending=False)
new_df

In [None]:
import seaborn as sns
import matplotlib.pyplot as plt
sns.set_style('darkgrid')

In [None]:
# sns.boxplot('', data=new2)

In [None]:
sns.barplot(x= 'Gender', y='mean', data=new_df)

In [None]:
new2 = new_df.sort_values(by='mean', ascending=False).sample(10)
plt.figure(figsize=(15,10))
sns.barplot(x='mean', y='Country', data=new2)
plt.xticks(rotation=90)
plt.show()

#### Question no 04 :Top ten countries salary according yearsCode

In [None]:
df.YearsCode.unique()

In [None]:
# 1-10 years beginners -> Beginners
# 11-20 years -> Intermediate
# 21-30 -> EXperted
# 31-40 -> ProExperted
# 41-50 -> MaxProExperted

In [None]:
def cleanYearsCode(x):
    if x=='Less than 1 year':
        return float(0)
    elif x=='More than 50 years':
        return float(50)
    else:
        return float(x)
    
df.YearsCode = df.YearsCode.apply(cleanYearsCode)


In [None]:
def modifyYearsCode(x):
    if (x>=0  and x<=10):
        return "Beginners"
    elif (x>=11 and x<=20):
        return 'Intermediate'
    elif (x>=21 and x<=30):
        return "Expert"
    elif (x>=31 and x<= 40):
        return 'ProExpert'
    else:
        return "ProMaxExpert"
df.YearsCode = df.YearsCode.apply(modifyYearsCode)


In [None]:
df.YearsCode.value_counts()

In [None]:
plt.figure(figsize=(10,7))
sns.barplot(x='YearsCode', y=df.salary, data=df)
plt.show()

In [None]:
sns.countplot('YearsCode', data=df)
plt.show()

In [None]:
df[['Country','YearsCode','salary']].sort_values(by=df.Country.value_counts())

In [None]:
df.Country.value_counts().head(15)

In [None]:
df1 = df.groupby('Country')['YearsCode','salary'].value_counts().reset_index().sort_values(by='salary', ascending=False).sample(10)

In [None]:
df.groupby('Country')['YearsCode','salary'].value_counts()

In [None]:
plt.figure(figsize=(10,10))
sns.barplot(y='Country',x='salary',hue='YearsCode', data=df1)
plt.show()

## Practice Exercise 01

#### Step 1. Import the necessary libraries

#### Step 2. Assign it to a variable called drinks.

In [None]:
drinks = pd.read_csv('datasets/drinks.csv')
drinks.head()

In [None]:
drinks.info()

#### Step 4. Which continent drinks more beer on average?

List of all continets
```
Asia 
Africa 
Europe 
North America 
South America
Australia/Oceania 
Antarctica
```

In [None]:
drinks.columns

In [None]:
drinks.continent.value_counts()

In [None]:
drinks.groupby('continent').agg({'beer_servings':'mean'}).sort_values(by='beer_servings')

#### Step 5. For each continent print the statistics for wine consumption.

In [None]:
# df.groupby('')
drinks.columns

In [None]:
# drinks[['continent','wine_servings']]
drinks.groupby('continent')['wine_servings'].describe()

#### Step 6. Print the mean alcohol consumption per continent for every column

In [None]:
drinks.columns

In [None]:
drinks.groupby('continent')['total_litres_of_pure_alcohol'].mean()

#### Step 7. Print the median alcohol consumption per continent for every column

In [None]:
drinks.groupby('continent')['total_litres_of_pure_alcohol'].median()

In [None]:
drinks.columns

#### Step 8. Print the mean, min and max values for spirit consumption.
#### This time output a DataFrame

In [None]:
drinks.groupby('continent').agg({'spirit_servings':['mean','min','max']})

In [1]:
from IPython.core.display import HTML

style = """
    <style>
        body {
            background-color: #f2fff2;
        }
        h1 {
            text-align: center;
            font-weight: bold;
            font-size: 36px;
            color: #4295F4;
            text-decoration: underline;
            padding-top: 15px;
        }
        
        h2 {
            text-align: left;
            font-weight: bold;
            font-size: 30px;
            color: #4A000A;
            text-decoration: underline;
            padding-top: 10px;
        }
        
        h3 {
            text-align: left;
            font-weight: bold;
            font-size: 30px;
            color: #f0081e;
            text-decoration: underline;
            padding-top: 5px;
        }

        
        p {
            text-align: center;
            font-size: 12 px;
            color: #0B9923;
        }
    </style>
"""

html_content = """
<h1>Hello</h1>
<p>Hello World</p>
<h2> Hello</h2>
<h3> World </h3>
"""

HTML(style + html_content)