In [40]:
import numpy as np
import pandas as pd

# **PANDAS**

### **Today's Agenda**:
#### **Adding and Dropping Columns and Rows**  
#### **Index methods**   
#### **Update & Modify Rows**   
#### **Groupby**   
#### **Missing Data**   





#### Reading From CSV

In [41]:
df = pd.read_csv('world-happiness-report-2021.csv')

### **Variables**

-**Ladder score** - Happiness score or subjective well-being. This is the national average response to the question of life evaluations.

**Question:** Please imagine a ladder, with steps numbered from 0 at the bottom to 10 at the top. The top of the ladder represents the best possible life for you and the bottom of the ladder represents the worst possible life for you. On which step of the ladder would you say you personally feel you stand at this time?
Logged GDP per capita - The GDP-per-capita time series from 2019 to 2020 using countryspecific forecasts of real GDP growth in 2020.

-**Social support** - Is the national average of the binary responses (either 0 or 1) to the GWP question “If you were in trouble, do you have relatives or friends you can count on to help you whenever you need them, or not.

-**Healthy life expectancy** - Healthy life expectancies at birth are based on the data extracted from the World Health Organization’s (WHO) Global Health Observatory data repository (Last updated: 2020-09-28)

-**Freedom to make life choices** - Is is the national average of responses to the GWP question “Are you satisfied or dissatisfied with your freedom to choose what you do with your life?

-**Generosity** - Generosity is the residual of regressing national average of response to the GWP question “Have you donated money to a charity in the past month?” on GDP per capita.

-**Perceptions of corruption** - The measure is the national average of the survey responses to two questions in the GWP: “Is corruption widespread throughout the government or not” and “Is corruption widespread within businesses or not?”

In [74]:
df1 = df[['Country name', 'Regional indicator', 'Ladder score',
       'Logged GDP per capita', 'Social support', 'Healthy life expectancy',
       'Freedom to make life choices', 'Generosity',
       'Perceptions of corruption']].copy()
df1.set_index('Country name', inplace=True)
df1

Unnamed: 0_level_0,Regional indicator,Ladder score,Logged GDP per capita,Social support,Healthy life expectancy,Freedom to make life choices,Generosity,Perceptions of corruption
Country name,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1
Finland,Western Europe,7.842,10.775,0.954,72.000,0.949,-0.098,0.186
Denmark,Western Europe,7.620,10.933,0.954,72.700,0.946,0.030,0.179
Switzerland,Western Europe,7.571,11.117,0.942,74.400,0.919,0.025,0.292
Iceland,Western Europe,7.554,10.878,0.983,73.000,0.955,0.160,0.673
Netherlands,Western Europe,7.464,10.932,0.942,72.400,0.913,0.175,0.338
...,...,...,...,...,...,...,...,...
Lesotho,Sub-Saharan Africa,3.512,7.926,0.787,48.700,0.715,-0.131,0.915
Botswana,Sub-Saharan Africa,3.467,9.782,0.784,59.269,0.824,-0.246,0.801
Rwanda,Sub-Saharan Africa,3.415,7.676,0.552,61.400,0.897,0.061,0.167
Zimbabwe,Sub-Saharan Africa,3.145,7.943,0.750,56.201,0.677,-0.047,0.821


In [43]:
df1.shape

(149, 8)

In [5]:
df1.head()

Unnamed: 0_level_0,Regional indicator,Ladder score,Logged GDP per capita,Social support,Healthy life expectancy,Freedom to make life choices,Generosity,Perceptions of corruption
Country name,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1
Finland,Western Europe,7.842,10.775,0.954,72.0,0.949,-0.098,0.186
Denmark,Western Europe,7.62,10.933,0.954,72.7,0.946,0.03,0.179
Switzerland,Western Europe,7.571,11.117,0.942,74.4,0.919,0.025,0.292
Iceland,Western Europe,7.554,10.878,0.983,73.0,0.955,0.16,0.673
Netherlands,Western Europe,7.464,10.932,0.942,72.4,0.913,0.175,0.338


In [6]:
df1.tail()

Unnamed: 0_level_0,Regional indicator,Ladder score,Logged GDP per capita,Social support,Healthy life expectancy,Freedom to make life choices,Generosity,Perceptions of corruption
Country name,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1
Lesotho,Sub-Saharan Africa,3.512,7.926,0.787,48.7,0.715,-0.131,0.915
Botswana,Sub-Saharan Africa,3.467,9.782,0.784,59.269,0.824,-0.246,0.801
Rwanda,Sub-Saharan Africa,3.415,7.676,0.552,61.4,0.897,0.061,0.167
Zimbabwe,Sub-Saharan Africa,3.145,7.943,0.75,56.201,0.677,-0.047,0.821
Afghanistan,South Asia,2.523,7.695,0.463,52.493,0.382,-0.102,0.924


In [44]:
df1.sample(3)

Unnamed: 0_level_0,Regional indicator,Ladder score,Logged GDP per capita,Social support,Healthy life expectancy,Freedom to make life choices,Generosity,Perceptions of corruption
Country name,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1
Philippines,Southeast Asia,5.88,9.076,0.83,62.0,0.917,-0.097,0.742
Iran,Middle East and North Africa,4.721,9.584,0.71,66.3,0.608,0.218,0.714
Kuwait,Middle East and North Africa,6.106,10.817,0.843,66.9,0.867,-0.104,0.736


In [45]:
df1.info()

<class 'pandas.core.frame.DataFrame'>
Index: 149 entries, Finland to Afghanistan
Data columns (total 8 columns):
 #   Column                        Non-Null Count  Dtype  
---  ------                        --------------  -----  
 0   Regional indicator            149 non-null    object 
 1   Ladder score                  149 non-null    float64
 2   Logged GDP per capita         149 non-null    float64
 3   Social support                149 non-null    float64
 4   Healthy life expectancy       149 non-null    float64
 5   Freedom to make life choices  149 non-null    float64
 6   Generosity                    149 non-null    float64
 7   Perceptions of corruption     149 non-null    float64
dtypes: float64(7), object(1)
memory usage: 10.5+ KB


In [10]:
df1.describe()

Unnamed: 0,Ladder score,Logged GDP per capita,Social support,Healthy life expectancy,Freedom to make life choices,Generosity,Perceptions of corruption
count,149.0,149.0,149.0,149.0,149.0,149.0,149.0
mean,5.532839,9.432208,0.814745,64.992799,0.791597,-0.015134,0.72745
std,1.073924,1.158601,0.114889,6.762043,0.113332,0.150657,0.179226
min,2.523,6.635,0.463,48.478,0.382,-0.288,0.082
25%,4.852,8.541,0.75,59.802,0.718,-0.126,0.667
50%,5.534,9.569,0.832,66.603,0.804,-0.036,0.781
75%,6.255,10.421,0.905,69.6,0.877,0.079,0.845
max,7.842,11.647,0.983,76.953,0.97,0.542,0.939


In [46]:
df1.isnull().sum()

Regional indicator              0
Ladder score                    0
Logged GDP per capita           0
Social support                  0
Healthy life expectancy         0
Freedom to make life choices    0
Generosity                      0
Perceptions of corruption       0
dtype: int64

In [12]:
df1.columns

Index(['Regional indicator', 'Ladder score', 'Logged GDP per capita',
       'Social support', 'Healthy life expectancy',
       'Freedom to make life choices', 'Generosity',
       'Perceptions of corruption'],
      dtype='object')

##### **value_counts()**

In [47]:
df1.index

Index(['Finland', 'Denmark', 'Switzerland', 'Iceland', 'Netherlands', 'Norway',
       'Sweden', 'Luxembourg', 'New Zealand', 'Austria',
       ...
       'Burundi', 'Yemen', 'Tanzania', 'Haiti', 'Malawi', 'Lesotho',
       'Botswana', 'Rwanda', 'Zimbabwe', 'Afghanistan'],
      dtype='object', name='Country name', length=149)

In [14]:
df['Regional indicator'].value_counts()

Sub-Saharan Africa                    36
Western Europe                        21
Latin America and Caribbean           20
Middle East and North Africa          17
Central and Eastern Europe            17
Commonwealth of Independent States    12
Southeast Asia                         9
South Asia                             7
East Asia                              6
North America and ANZ                  4
Name: Regional indicator, dtype: int64

#### Exercise (slicing, loc & iloc):

-Germany, Ethiopia, Peru, Chad, Japan.  Freedom to make life choices, Healthy life expectancy

In [170]:
df1.loc[['Germany','Ethiopia','Peru','Chad','Japan'], ['Freedom to make life choices', 'Healthy life expectancy']]

Unnamed: 0_level_0,Freedom to make life choices,Healthy life expectancy
Country name,Unnamed: 1_level_1,Unnamed: 2_level_1
Germany,0.875,72.5
Ethiopia,0.752,59.0
Peru,0.822,68.25
Chad,0.579,48.478
Japan,0.796,75.1


In [52]:
df1.head(1)

Unnamed: 0_level_0,Regional indicator,Ladder score,Logged GDP per capita,Social support,Healthy life expectancy,Freedom to make life choices,Generosity,Perceptions of corruption
Country name,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1
Finland,Western Europe,7.842,10.775,0.954,72.0,0.949,-0.098,0.186


In [53]:
df1.iloc[[25,52,65,145], [1,7]]

Unnamed: 0_level_0,Ladder score,Perceptions of corruption
Country name,Unnamed: 1_level_1,Unnamed: 2_level_1
Saudi Arabia,6.494,0.684
Hungary,5.992,0.876
Ecuador,5.764,0.843
Botswana,3.467,0.801


### Conditional Filtering

#### Exercise:Conditional Filtering

- Let's bring the Happiness score from the countries where in Central and Easter  Europe countries 
- **and** where perceptions of corruption higher than 0.9
    


In [54]:
filt = (df1['Regional indicator']=='Central and Eastern Europe')& (df1['Perceptions of corruption']>0.9)

df1.loc[filt,'Ladder score']

Country name
Kosovo                    6.372
Slovakia                  6.331
Romania                   6.140
Croatia                   5.882
Bosnia and Herzegovina    5.813
Bulgaria                  5.266
Albania                   5.117
North Macedonia           5.101
Name: Ladder score, dtype: float64

#### Exercise:Conditional Filtering

- Let's bring the Happiness score from the countries where in Sub-Saharan Africa countries 
- **or** countries where Healthy life expectancy'lower than 50
    


In [19]:
filt = (df1['Regional indicator']=='Sub-Saharan Africa') | (df1['Healthy life expectancy']<50)
df1.loc[filt,'Ladder score']

Country name
Mauritius              6.049
Congo (Brazzaville)    5.342
Ivory Coast            5.306
Cameroon               5.142
Senegal                5.132
Ghana                  5.088
Niger                  5.074
Gambia                 5.051
Benin                  5.045
Guinea                 4.984
South Africa           4.956
Gabon                  4.852
Burkina Faso           4.834
Mozambique             4.794
Nigeria                4.759
Mali                   4.723
Uganda                 4.636
Liberia                4.625
Kenya                  4.607
Namibia                4.574
Chad                   4.355
Swaziland              4.308
Comoros                4.289
Ethiopia               4.275
Mauritania             4.227
Madagascar             4.208
Togo                   4.107
Zambia                 4.073
Sierra Leone           3.849
Burundi                3.775
Tanzania               3.623
Malawi                 3.600
Lesotho                3.512
Botswana               3.467
R

#### Exercise:Conditional Filtering

- Let's **randomly** bring 7 countries with their 'Regional indicator','Ladder score','Perceptions of corruption' scores

- Condition : Those countries are not in Western Europe **or** their 'Perceptions of corruption' scores are not bigger than 0.66
    


In [66]:
filt = (df1['Regional indicator']=='Western Europe') | (df1['Perceptions of corruption']>=0.66)
df1.loc[~filt,['Regional indicator','Ladder score','Perceptions of corruption']].sample(7)

Unnamed: 0_level_0,Regional indicator,Ladder score,Perceptions of corruption
Country name,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
Canada,North America and ANZ,7.103,0.415
Uzbekistan,Commonwealth of Independent States,6.179,0.515
Armenia,Commonwealth of Independent States,5.283,0.629
Uruguay,Latin America and Caribbean,6.431,0.59
Belarus,Commonwealth of Independent States,5.534,0.627
United Arab Emirates,Middle East and North Africa,6.561,0.589
Rwanda,Sub-Saharan Africa,3.415,0.167


### **Adding and Dropping Columns and Rows**

##### -To add column to df  --> df[new column name] = values
##### -To drop column from df --> df.drop('column name', axis = 1, inplace=True)
##### -To add row to df  --> df.append(row)
##### -To drop row from df --> df.drop('row name', axis = 0)



In [67]:
# to add new column 

new = np.random.rand(149)



In [68]:
df1['new'] = new

In [69]:
df1

Unnamed: 0_level_0,Regional indicator,Ladder score,Logged GDP per capita,Social support,Healthy life expectancy,Freedom to make life choices,Generosity,Perceptions of corruption,new
Country name,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1
Finland,Western Europe,7.842,10.775,0.954,72.000,0.949,-0.098,0.186,0.455133
Denmark,Western Europe,7.620,10.933,0.954,72.700,0.946,0.030,0.179,0.271796
Switzerland,Western Europe,7.571,11.117,0.942,74.400,0.919,0.025,0.292,0.735174
Iceland,Western Europe,7.554,10.878,0.983,73.000,0.955,0.160,0.673,0.991947
Netherlands,Western Europe,7.464,10.932,0.942,72.400,0.913,0.175,0.338,0.283606
...,...,...,...,...,...,...,...,...,...
Lesotho,Sub-Saharan Africa,3.512,7.926,0.787,48.700,0.715,-0.131,0.915,0.534563
Botswana,Sub-Saharan Africa,3.467,9.782,0.784,59.269,0.824,-0.246,0.801,0.940773
Rwanda,Sub-Saharan Africa,3.415,7.676,0.552,61.400,0.897,0.061,0.167,0.702884
Zimbabwe,Sub-Saharan Africa,3.145,7.943,0.750,56.201,0.677,-0.047,0.821,0.074784


In [70]:
# to drop column
df1.drop('new',axis=1, inplace=True)
df1

Unnamed: 0_level_0,Regional indicator,Ladder score,Logged GDP per capita,Social support,Healthy life expectancy,Freedom to make life choices,Generosity,Perceptions of corruption
Country name,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1
Finland,Western Europe,7.842,10.775,0.954,72.000,0.949,-0.098,0.186
Denmark,Western Europe,7.620,10.933,0.954,72.700,0.946,0.030,0.179
Switzerland,Western Europe,7.571,11.117,0.942,74.400,0.919,0.025,0.292
Iceland,Western Europe,7.554,10.878,0.983,73.000,0.955,0.160,0.673
Netherlands,Western Europe,7.464,10.932,0.942,72.400,0.913,0.175,0.338
...,...,...,...,...,...,...,...,...
Lesotho,Sub-Saharan Africa,3.512,7.926,0.787,48.700,0.715,-0.131,0.915
Botswana,Sub-Saharan Africa,3.467,9.782,0.784,59.269,0.824,-0.246,0.801
Rwanda,Sub-Saharan Africa,3.415,7.676,0.552,61.400,0.897,0.061,0.167
Zimbabwe,Sub-Saharan Africa,3.145,7.943,0.750,56.201,0.677,-0.047,0.821


In [81]:
# to add row
new_row= df1.loc['Germany']
new_row

Regional indicator              Western Europe
Ladder score                             7.155
Logged GDP per capita                   10.873
Social support                           0.903
Healthy life expectancy                   72.5
Freedom to make life choices             0.875
Generosity                               0.011
Perceptions of corruption                 0.46
Name: Germany, dtype: object

In [76]:
new_row.name='New_name'

In [77]:
df1= df1.append(new_row)
df1

Unnamed: 0_level_0,Regional indicator,Ladder score,Logged GDP per capita,Social support,Healthy life expectancy,Freedom to make life choices,Generosity,Perceptions of corruption
Country name,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1
Finland,Western Europe,7.842,10.775,0.954,72.000,0.949,-0.098,0.186
Denmark,Western Europe,7.620,10.933,0.954,72.700,0.946,0.030,0.179
Switzerland,Western Europe,7.571,11.117,0.942,74.400,0.919,0.025,0.292
Iceland,Western Europe,7.554,10.878,0.983,73.000,0.955,0.160,0.673
Netherlands,Western Europe,7.464,10.932,0.942,72.400,0.913,0.175,0.338
...,...,...,...,...,...,...,...,...
Botswana,Sub-Saharan Africa,3.467,9.782,0.784,59.269,0.824,-0.246,0.801
Rwanda,Sub-Saharan Africa,3.415,7.676,0.552,61.400,0.897,0.061,0.167
Zimbabwe,Sub-Saharan Africa,3.145,7.943,0.750,56.201,0.677,-0.047,0.821
Afghanistan,South Asia,2.523,7.695,0.463,52.493,0.382,-0.102,0.924


In [78]:
df1.drop('New_name', axis=0, inplace=True)
df1

Unnamed: 0_level_0,Regional indicator,Ladder score,Logged GDP per capita,Social support,Healthy life expectancy,Freedom to make life choices,Generosity,Perceptions of corruption
Country name,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1
Finland,Western Europe,7.842,10.775,0.954,72.000,0.949,-0.098,0.186
Denmark,Western Europe,7.620,10.933,0.954,72.700,0.946,0.030,0.179
Switzerland,Western Europe,7.571,11.117,0.942,74.400,0.919,0.025,0.292
Iceland,Western Europe,7.554,10.878,0.983,73.000,0.955,0.160,0.673
Netherlands,Western Europe,7.464,10.932,0.942,72.400,0.913,0.175,0.338
...,...,...,...,...,...,...,...,...
Lesotho,Sub-Saharan Africa,3.512,7.926,0.787,48.700,0.715,-0.131,0.915
Botswana,Sub-Saharan Africa,3.467,9.782,0.784,59.269,0.824,-0.246,0.801
Rwanda,Sub-Saharan Africa,3.415,7.676,0.552,61.400,0.897,0.061,0.167
Zimbabwe,Sub-Saharan Africa,3.145,7.943,0.750,56.201,0.677,-0.047,0.821


In [32]:
df1.reset_index(inplace=True)
df1

In [106]:
#set_index

df1.reset_index(inplace=True)

In [107]:
df1.index

RangeIndex(start=0, stop=149, step=1)

In [109]:
df1.set_index('Country name', inplace=True)
df1

Unnamed: 0_level_0,Regional indicator,Ladder score,Logged GDP per capita,Social support,Healthy life expectancy,Freedom to make life choices,Generosity,Perceptions of corruption
Country name,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1
Finland,Western Europe,7.842,10.775,0.954,72.000,0.949,-0.098,0.186
Denmark,Western Europe,7.620,10.933,0.954,72.700,0.946,0.030,0.179
Switzerland,Western Europe,7.571,11.117,0.942,74.400,0.919,0.025,0.292
Iceland,Western Europe,7.554,10.878,0.983,73.000,0.955,0.160,0.673
Netherlands,Western Europe,7.464,10.932,0.942,72.400,0.913,0.175,0.338
...,...,...,...,...,...,...,...,...
Lesotho,Sub-Saharan Africa,3.512,7.926,0.787,48.700,0.715,-0.131,0.915
Botswana,Sub-Saharan Africa,3.467,9.782,0.784,59.269,0.824,-0.246,0.801
Rwanda,Sub-Saharan Africa,3.415,7.676,0.552,61.400,0.897,0.061,0.167
Zimbabwe,Sub-Saharan Africa,3.145,7.943,0.750,56.201,0.677,-0.047,0.821


In [35]:
#reset_index

In [36]:
#set_index

In [37]:
#sort_index


### Update & Modify Rows

`to update one row` --> df1.loc[2, ['last','email']] = ['Smith', 'JohnSmith@email.com']


`to update multiple rows` : We have several options:

######  First is the basic one: df['email']= df['email'].str.lower()

#####  Advance Options:
`Apply` : Works both on series (one column) and dataframes. But mostly use with series
######  Apply on series --> df['email'].apply(len) <-- it gives the length of the email on the email column(e-mail series)

`Replace` : Works like map, but keeps the not defined values intact. df['first_name'].replace{'chris':'Mike','jane':'Angela'} <-- John keeps its value.

In [82]:
#df2

df2= df1.copy()
df2

Unnamed: 0_level_0,Regional indicator,Ladder score,Logged GDP per capita,Social support,Healthy life expectancy,Freedom to make life choices,Generosity,Perceptions of corruption
Country name,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1
Finland,Western Europe,7.842,10.775,0.954,72.000,0.949,-0.098,0.186
Denmark,Western Europe,7.620,10.933,0.954,72.700,0.946,0.030,0.179
Switzerland,Western Europe,7.571,11.117,0.942,74.400,0.919,0.025,0.292
Iceland,Western Europe,7.554,10.878,0.983,73.000,0.955,0.160,0.673
Netherlands,Western Europe,7.464,10.932,0.942,72.400,0.913,0.175,0.338
...,...,...,...,...,...,...,...,...
Lesotho,Sub-Saharan Africa,3.512,7.926,0.787,48.700,0.715,-0.131,0.915
Botswana,Sub-Saharan Africa,3.467,9.782,0.784,59.269,0.824,-0.246,0.801
Rwanda,Sub-Saharan Africa,3.415,7.676,0.552,61.400,0.897,0.061,0.167
Zimbabwe,Sub-Saharan Africa,3.145,7.943,0.750,56.201,0.677,-0.047,0.821


In [83]:
df2['Social support'].apply(np.square)

Country name
Finland        0.910116
Denmark        0.910116
Switzerland    0.887364
Iceland        0.966289
Netherlands    0.887364
                 ...   
Lesotho        0.619369
Botswana       0.614656
Rwanda         0.304704
Zimbabwe       0.562500
Afghanistan    0.214369
Name: Social support, Length: 149, dtype: float64

In [65]:
# Update the ladder score of  of Afghanistan as South_Asia

In [97]:
df2.loc['Afghanistan']['Ladder score']

7.567

In [98]:
#### Be carefull about the following error !!!!

df2.loc['Afghanistan']['Ladder score']=8.654

A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  df2.loc['Afghanistan']['Ladder score']=8.654


In [94]:
df2.loc['Afghanistan']

Regional indicator              South Asia
Ladder score                         2.653
Logged GDP per capita                7.695
Social support                       0.463
Healthy life expectancy             52.493
Freedom to make life choices         0.382
Generosity                          -0.102
Perceptions of corruption            0.924
Name: Afghanistan, dtype: object

In [99]:
df2.loc['Afghanistan','Ladder score'] = 7.678


In [100]:
df2.loc['Afghanistan']

Regional indicator              South Asia
Ladder score                         7.678
Logged GDP per capita                7.695
Social support                       0.463
Healthy life expectancy             52.493
Freedom to make life choices         0.382
Generosity                          -0.102
Perceptions of corruption            0.924
Name: Afghanistan, dtype: object

#### **Apply**

In [277]:
df2['Healthy life expectancy'].head()

Country name
Afghanistan    52.493
Albania        68.999
Algeria        66.005
Argentina      69.000
Armenia        67.055
Name: Healthy life expectancy, dtype: float64

In [284]:
df2['Healthy life expectancy'].apply(np.square)

Country name
Afghanistan    2755.515049
Albania        4760.862001
Algeria        4356.660025
Argentina      4761.000000
Armenia        4496.373025
                  ...     
Venezuela      4448.890000
Vietnam        4628.625156
Yemen          3262.922884
Zambia         3114.644481
Zimbabwe       3158.552401
Name: Healthy life expectancy, Length: 149, dtype: float64

### **Exercise**

Based on the our dataset's 'Healthy life expectancy' scores, lets make a new column with the name of **'Explained Healthy life expectancy'** with the new values which should consist of:
- Let's think 'Healthy life expectancy''s mean score as our threshold. 
-If 'Healthy life expectancy' is equal and greater than 75% of the 'Healthy life expectancy' score of the dataset , it should be written **''Long Life expectancy'**
-If 'Healthy life expectancy' less than 75% of the 'Healthy life expectancy' score of the dataset but bigger than our threshold, it should be written **'Longer than normal expectancy'**
--If 'Healthy life expectancy' equal or less than our threshold but bigger than 25% of the 'Healthy life expectancy' score of the dataset, it should be written **'Normal life expectancy'**
--If 'Healthy life expectancy' equal or less than 25% of the 'Healthy life expectancy' score of the dataset, it should be written **'Lower life expectancy'**
--Since We have projects with United states make sure that United States should be written as **'United States of America'**


In [287]:
df2['Healthy life expectancy'].head()

Country name
Afghanistan    52.493
Albania        68.999
Algeria        66.005
Argentina      69.000
Armenia        67.055
Name: Healthy life expectancy, dtype: float64

In [288]:
df2['Healthy life expectancy'].describe()

count    149.000000
mean      64.992799
std        6.762043
min       48.478000
25%       59.802000
50%       66.603000
75%       69.600000
max       76.953000
Name: Healthy life expectancy, dtype: float64

In [289]:
def long_life(life):
    if life >= 69.6:
        return 'Long Life expectancy'
    elif life > 64.992799  and life <69.6:
        return 'Longer than normal expectancy'
    elif life  > 59.802000  and life  <= 64.992799:
        return 'Normal life expectancy'
    else:
        return 'Lower life expectancy'

In [291]:
df2['Explained Healthy life expectancy']= df2['Healthy life expectancy'].apply(long_life)

In [305]:
df2.rename(index={'United States':'United States of America'},inplace=True)

In [311]:
df1.loc['United States','Healthy life expectancy']

68.2

In [307]:
df2.loc['United States of America','Explained Healthy life expectancy']

'Longer than normal expectancy'

#### **Replace**

In [314]:
df2['Healthy life expectancy']=df2['Explained Healthy life expectancy'].replace({'Longer than normal expectancy':'Life is short'}, inplace=True)
df2.loc['United States of America','Explained Healthy life expectancy']

'Life is short'

### Groupby

In [None]:
# Use a group by Regional indicator  to view the mean scores of the happiness(Ladder score)

# What can we say 
# Can we say that happiness differ by Regional indicator as the mean happiness score for one region is higher than others

In [111]:
df1.columns

Index(['Regional indicator', 'Ladder score', 'Logged GDP per capita',
       'Social support', 'Healthy life expectancy',
       'Freedom to make life choices', 'Generosity',
       'Perceptions of corruption'],
      dtype='object')

In [113]:
df2['Regional indicator'].value_counts()

Sub-Saharan Africa                    36
Western Europe                        21
Latin America and Caribbean           20
Middle East and North Africa          17
Central and Eastern Europe            17
Commonwealth of Independent States    12
Southeast Asia                         9
South Asia                             7
East Asia                              6
North America and ANZ                  4
Name: Regional indicator, dtype: int64

In [131]:
df2['Regional indicator'].value_counts(normalize=True)

Sub-Saharan Africa                    0.241611
Western Europe                        0.140940
Latin America and Caribbean           0.134228
Central and Eastern Europe            0.114094
Middle East and North Africa          0.114094
Commonwealth of Independent States    0.080537
Southeast Asia                        0.060403
South Asia                            0.046980
East Asia                             0.040268
North America and ANZ                 0.026846
Name: REGIONAL_INDICATOR, dtype: float64

In [138]:
df2.index

Index(['Finland', 'Denmark', 'Switzerland', 'Iceland', 'Netherlands', 'Norway',
       'Sweden', 'Luxembourg', 'New Zealand', 'Austria',
       ...
       'Burundi', 'Yemen', 'Tanzania', 'Haiti', 'Malawi', 'Lesotho',
       'Botswana', 'Rwanda', 'Zimbabwe', 'Afghanistan'],
      dtype='object', name='Country name', length=149)

In [114]:
df1.groupby(['Regional indicator'])

<pandas.core.groupby.generic.DataFrameGroupBy object at 0x7fa5c6d8f070>

In [122]:
df2.groupby(['Regional indicator']).get_group('Middle East and North Africa').mean()

Ladder score                     5.219765
Logged GDP per capita            9.666118
Social support                   0.797647
Healthy life expectancy         65.609118
Freedom to make life choices     0.716471
Generosity                      -0.079765
Perceptions of corruption        0.762235
dtype: float64

In [118]:
df2.groupby(['Regional indicator']).get_group('Middle East and North Africa')

Unnamed: 0_level_0,Regional indicator,Ladder score,Logged GDP per capita,Social support,Healthy life expectancy,Freedom to make life choices,Generosity,Perceptions of corruption
Country name,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1
Israel,Middle East and North Africa,7.157,10.575,0.939,73.503,0.8,0.031,0.753
Bahrain,Middle East and North Africa,6.647,10.669,0.862,69.495,0.925,0.089,0.722
United Arab Emirates,Middle East and North Africa,6.561,11.085,0.844,67.333,0.932,0.074,0.589
Saudi Arabia,Middle East and North Africa,6.494,10.743,0.891,66.603,0.877,-0.149,0.684
Kuwait,Middle East and North Africa,6.106,10.817,0.843,66.9,0.867,-0.104,0.736
Libya,Middle East and North Africa,5.41,9.622,0.827,62.3,0.771,-0.087,0.667
Turkey,Middle East and North Africa,4.948,10.24,0.822,67.199,0.576,-0.139,0.776
Morocco,Middle East and North Africa,4.918,8.903,0.56,66.208,0.774,-0.236,0.801
Algeria,Middle East and North Africa,4.887,9.342,0.802,66.005,0.48,-0.067,0.752
Iraq,Middle East and North Africa,4.854,9.24,0.746,60.583,0.63,-0.053,0.875


**Exercise** : We are working as a Data Scientist at an internationally well-known infrastructure company, 
which makes projects all over the world. Right now, our company's focus areas are: 'Southeast Asia' and 'South Asia'.

In the first place, our boss wants us to prepare a descriptive report on the institutional integrity and citizens to trust their governments to see whether it is worthed to start a project idea to invest huge amount of money and time to those regions.

We are preparing a descriptive report for our boss, who does not have much knowledge of statistics or reading statistical results. As data scientists, we should prepare something concise and understandable.

Let's start with the following steps:

-First, decide on which variable would be a good choice to work on 

-Second give our boss both numbers and also kind of explanatory side (like we did before)

-Make a comparison between two regions in the text 

-Our report should contain 

--overall regions, 

--'Southeast Asia' and 'South Asia'

--variable and modified version of the variable

