<a href="https://colab.research.google.com/github/manjeetsrivastava/dsbda/blob/main/dsbda1.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

**About Dataset**
Context
Understanding global economic dynamics, specifically the trends in inflation rates, is paramount for policymakers, economists, and researchers. This dataset, covering the years 1980 to 2024, offers a comprehensive perspective on inflation across various countries. The primary focus is on dissecting the data based on country-specific indicators, providing valuable insights into the multifaceted factors influencing economic environments on a global scale.
link for dataset:- https://www.kaggle.com/datasets/sazidthe1/global-inflation-data?resource=download

**1. Import all the required libraries**

In [3]:
import pandas as pd
import numpy as np
from sklearn.preprocessing import MinMaxScaler
from sklearn.preprocessing import LabelEncoder

**3. Load the dataset into pandas dataframe.**

In [4]:
#load the dataset in data frame
df = pd.read_csv('/content/global_inflation_data.csv')

**4. Data preprocessing: check for the muissing values in the data using pandas isnull(),describe()function to get some initial statistics. Types of variables . Check the dimensions of the data frame.**

In [5]:
df.columns

Index(['country_name', 'indicator_name', '1980', '1981', '1982', '1983',
       '1984', '1985', '1986', '1987', '1988', '1989', '1990', '1991', '1992',
       '1993', '1994', '1995', '1996', '1997', '1998', '1999', '2000', '2001',
       '2002', '2003', '2004', '2005', '2006', '2007', '2008', '2009', '2010',
       '2011', '2012', '2013', '2014', '2015', '2016', '2017', '2018', '2019',
       '2020', '2021', '2022', '2023', '2024'],
      dtype='object')

In [6]:
df.head()

Unnamed: 0,country_name,indicator_name,1980,1981,1982,1983,1984,1985,1986,1987,...,2015,2016,2017,2018,2019,2020,2021,2022,2023,2024
0,Afghanistan,Annual average inflation (consumer prices) rate,13.4,22.2,18.2,15.9,20.4,8.7,-2.1,18.4,...,-0.66,4.38,4.98,0.63,2.3,5.44,5.06,13.71,9.1,
1,Albania,Annual average inflation (consumer prices) rate,,,,,,,,,...,1.9,1.3,2.0,2.0,1.4,1.6,2.0,6.7,4.8,4.0
2,Algeria,Annual average inflation (consumer prices) rate,9.7,14.6,6.6,7.8,6.3,10.4,14.0,5.9,...,4.8,6.4,5.6,4.3,2.0,2.4,7.2,9.3,9.0,6.8
3,Andorra,Annual average inflation (consumer prices) rate,,,,,,,,,...,-1.1,-0.4,2.6,1.0,0.5,0.1,1.7,6.2,5.2,3.5
4,Angola,Annual average inflation (consumer prices) rate,46.7,1.4,1.8,1.8,1.8,1.8,1.8,1.8,...,9.2,30.7,29.8,19.6,17.1,22.3,25.8,21.4,13.1,22.3


In [7]:
df['indicator_name'].unique()

array(['Annual average inflation (consumer prices) rate'], dtype=object)

In [8]:
df.tail()

Unnamed: 0,country_name,indicator_name,1980,1981,1982,1983,1984,1985,1986,1987,...,2015,2016,2017,2018,2019,2020,2021,2022,2023,2024
191,Vietnam,Annual average inflation (consumer prices) rate,25.2,69.6,95.4,49.5,64.9,91.6,453.5,360.4,...,0.6,2.7,3.5,3.5,2.8,3.2,1.8,3.2,3.4,3.4
192,West Bank and Gaza,Annual average inflation (consumer prices) rate,,,,,,,,,...,1.4,-0.2,0.2,-0.2,1.6,-0.7,1.2,3.7,3.4,2.7
193,Yemen,Annual average inflation (consumer prices) rate,,,,,,,,,...,22.0,21.3,30.4,33.6,15.7,21.7,31.5,29.5,14.9,17.3
194,Zambia,Annual average inflation (consumer prices) rate,11.7,14.0,12.5,19.7,20.0,37.4,48.0,43.0,...,10.1,17.9,6.6,7.5,9.2,15.7,22.0,11.0,10.6,9.6
195,Zimbabwe,Annual average inflation (consumer prices) rate,,5.6,0.6,-8.5,-1.9,-16.0,10.7,12.8,...,-2.4,-1.6,0.9,10.6,255.3,557.2,98.5,193.4,314.5,222.4


In [9]:
df.head(10)

Unnamed: 0,country_name,indicator_name,1980,1981,1982,1983,1984,1985,1986,1987,...,2015,2016,2017,2018,2019,2020,2021,2022,2023,2024
0,Afghanistan,Annual average inflation (consumer prices) rate,13.4,22.2,18.2,15.9,20.4,8.7,-2.1,18.4,...,-0.66,4.38,4.98,0.63,2.3,5.44,5.06,13.71,9.1,
1,Albania,Annual average inflation (consumer prices) rate,,,,,,,,,...,1.9,1.3,2.0,2.0,1.4,1.6,2.0,6.7,4.8,4.0
2,Algeria,Annual average inflation (consumer prices) rate,9.7,14.6,6.6,7.8,6.3,10.4,14.0,5.9,...,4.8,6.4,5.6,4.3,2.0,2.4,7.2,9.3,9.0,6.8
3,Andorra,Annual average inflation (consumer prices) rate,,,,,,,,,...,-1.1,-0.4,2.6,1.0,0.5,0.1,1.7,6.2,5.2,3.5
4,Angola,Annual average inflation (consumer prices) rate,46.7,1.4,1.8,1.8,1.8,1.8,1.8,1.8,...,9.2,30.7,29.8,19.6,17.1,22.3,25.8,21.4,13.1,22.3
5,Antigua and Barbuda,Annual average inflation (consumer prices) rate,19.0,11.5,4.2,2.3,3.8,1.0,0.5,3.6,...,1.0,-0.5,2.4,1.2,1.4,1.1,1.6,7.5,5.0,2.9
6,Argentina,Annual average inflation (consumer prices) rate,,,,,,,,,...,,,25.7,34.3,53.5,42.0,48.4,72.4,121.7,93.7
7,Armenia,Annual average inflation (consumer prices) rate,,,,,,,,,...,3.7,-1.4,1.2,2.5,1.4,1.2,7.2,8.6,3.5,4.0
8,Aruba,Annual average inflation (consumer prices) rate,,,,,,,,3.6,...,0.5,-0.9,-1.0,3.6,3.9,-1.3,0.7,5.5,4.5,2.3
9,Australia,Annual average inflation (consumer prices) rate,10.1,9.5,11.4,10.0,4.0,6.7,9.1,8.5,...,1.5,1.3,2.0,1.9,1.6,0.9,2.8,6.6,5.8,4.0


In [10]:
# Quick information about the global inflamation dataset ex:-  non null count and data type in column name
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 196 entries, 0 to 195
Data columns (total 47 columns):
 #   Column          Non-Null Count  Dtype  
---  ------          --------------  -----  
 0   country_name    196 non-null    object 
 1   indicator_name  196 non-null    object 
 2   1980            140 non-null    float64
 3   1981            144 non-null    float64
 4   1982            145 non-null    float64
 5   1983            145 non-null    float64
 6   1984            145 non-null    float64
 7   1985            145 non-null    float64
 8   1986            145 non-null    float64
 9   1987            147 non-null    float64
 10  1988            147 non-null    float64
 11  1989            147 non-null    float64
 12  1990            150 non-null    float64
 13  1991            155 non-null    float64
 14  1992            158 non-null    float64
 15  1993            169 non-null    float64
 16  1994            171 non-null    float64
 17  1995            172 non-null    flo

In [11]:
# it gives data about count, mean
df.describe()

Unnamed: 0,1980,1981,1982,1983,1984,1985,1986,1987,1988,1989,...,2015,2016,2017,2018,2019,2020,2021,2022,2023,2024
count,140.0,144.0,145.0,145.0,145.0,145.0,145.0,147.0,147.0,147.0,...,194.0,194.0,195.0,195.0,195.0,194.0,194.0,194.0,192.0,191.0
mean,21.757143,17.796528,17.029655,19.177241,26.97931,103.215172,25.262069,111.294558,58.635374,101.246259,...,4.116186,6.594742,7.656821,339.688359,107.294872,19.83268,16.577629,13.616031,13.736458,9.309424
std,33.656118,18.992691,22.797064,34.806824,111.889811,975.748316,86.93121,1081.094434,400.370989,679.792142,...,10.763149,31.096216,34.954954,4681.227548,1425.256254,173.722612,117.154632,25.282229,39.667874,25.195589
min,-7.3,0.0,-0.9,-8.5,-7.4,-16.0,-17.6,-31.2,-13.0,-9.6,...,-3.8,-5.6,-13.3,-44.4,-3.2,-2.6,-3.0,-3.2,-0.8,1.2
25%,9.55,8.6,6.1,5.0,3.8,2.8,1.8,2.15,2.55,3.35,...,0.1,0.1,1.15,1.3,0.8,0.4,1.925,5.5,4.0,2.8
50%,13.85,12.5,10.3,8.7,8.0,7.1,5.8,5.9,6.8,6.9,...,1.5,1.5,2.4,2.5,2.2,1.9,3.5,8.1,5.8,4.0
75%,20.525,19.8,16.7,16.0,17.1,16.8,18.2,16.65,17.8,16.7,...,4.8,5.125,5.2,4.3,4.0,4.575,5.975,11.975,9.925,5.8
max,316.6,116.8,123.6,275.6,1281.3,11749.6,885.2,13109.5,4775.2,7428.7,...,121.7,346.1,438.1,65374.1,19906.0,2355.1,1588.5,193.4,360.0,222.4


In [12]:
# check dimensions of the data frame.

df.shape

(196, 47)

In [13]:
# To check data types
column_dtype=df.dtypes

In [14]:
print(column_dtype)

country_name       object
indicator_name     object
1980              float64
1981              float64
1982              float64
1983              float64
1984              float64
1985              float64
1986              float64
1987              float64
1988              float64
1989              float64
1990              float64
1991              float64
1992              float64
1993              float64
1994              float64
1995              float64
1996              float64
1997              float64
1998              float64
1999              float64
2000              float64
2001              float64
2002              float64
2003              float64
2004              float64
2005              float64
2006              float64
2007              float64
2008              float64
2009              float64
2010              float64
2011              float64
2012              float64
2013              float64
2014              float64
2015              float64
2016        

**5.Data Normalization**

In [15]:
# initialize Minmax Scaler
scaler = MinMaxScaler()

#fitting the scaler to the data and transforming it.
scaled_1980 = scaler.fit_transform(df[['1980']])


In [25]:
scaled_1980

array([[0.06390861],
       [       nan],
       [0.05248533],
       [       nan],
       [0.16671812],
       [0.0811979 ],
       [       nan],
       [       nan],
       [       nan],
       [0.05372028],
       [0.04198827],
       [       nan],
       [0.06020377],
       [0.03426984],
       [0.04631059],
       [0.07965421],
       [       nan],
       [0.04322322],
       [0.04445817],
       [0.0521766 ],
       [0.06112998],
       [0.16795307],
       [       nan],
       [0.05989503],
       [0.30101883],
       [       nan],
       [0.02253782],
       [0.0605125 ],
       [0.02624267],
       [0.06915715],
       [       nan],
       [0.04631059],
       [0.05402902],
       [0.06359988],
       [0.04908923],
       [0.1309046 ],
       [       nan],
       [0.10250077],
       [0.06359988],
       [0.14603273],
       [0.04507564],
       [0.07841927],
       [       nan],
       [0.06421735],
       [       nan],
       [0.0497067 ],
       [0.05742513],
       [     

In [17]:
variable_types = {'character': [], 'numeric': [], 'integer': []}

# Iterate over columns and check data types
for column in df.columns:
    dtype = df[column].dtype

    # Check if the data type is object (character)
    if dtype == 'object':
        variable_types['character'].append(column)
    # Check if the data type is numeric
    elif dtype == 'float64':
        variable_types['numeric'].append(column)
    # Check if the data type is integer
    elif dtype == 'int64':
        variable_types['integer'].append(column)

# Print summary of variable types
print("Character Variables:", variable_types['character'])
print("Numeric Variables:", variable_types['numeric'])
print("Integer Variables:", variable_types['integer'])

Character Variables: ['country_name', 'indicator_name']
Numeric Variables: ['1980', '1981', '1982', '1983', '1984', '1985', '1986', '1987', '1988', '1989', '1990', '1991', '1992', '1993', '1994', '1995', '1996', '1997', '1998', '1999', '2000', '2001', '2002', '2003', '2004', '2005', '2006', '2007', '2008', '2009', '2010', '2011', '2012', '2013', '2014', '2015', '2016', '2017', '2018', '2019', '2020', '2021', '2022', '2023', '2024']
Integer Variables: []


In [23]:
# type conversion

df['1980'] = df['1980'].astype(str)

In [24]:
df['1980'].dtype

dtype('O')

**6.Turn Categorical values into quantitative variables**

In [18]:
# Initialize LabelEncoder

label_encoder = LabelEncoder()
 # convert categorical data into numerical
df["country_name_edited"] = label_encoder.fit_transform(df['country_name'])

In [19]:
df['country_name_edited']


0        0
1        1
2        2
3        3
4        4
      ... 
191    191
192    192
193    193
194    194
195    195
Name: country_name_edited, Length: 196, dtype: int64