### **Week 3 Assignment** 
### Sandhya Mainali
### Presidential Graduate School
### PRG 303: Python Programming
### Professor Pant
### March 23,2025

Data related to the economic indicators of Nepal were gathered for this purpose from multiple sources, the main one being the Publications and Statistics department on the official portal of Nepal Rastra Bank (NRB). Remittance inflow, trade balance, inflation rates, interest rates in Nepali banks, and other major economic indicators are all included in the statistics. These datasets are made available in multiple forms like Excel and PDF files. In order to extract organized data from these files and enable further analysis, they had to be converted into machine-readable forms like CSV and JSON.
There were a number of challenges faced while collecting the data. Some datasets were collected by extracting them manually from PDF files as they were not directly available in the structured format. Some examples include collecting the data from other websites such as bankbyaj.com for interest rates and other economics analyses for the trade balance numbers because the most recent numbers weren't available. These challenges were overcome by effectively cleaning, normalizing, and structuring the data using Python data preprocessing techniques using the Pandas module.

The data collection was done mainly by downloading a series of excel and PDF files from the Nepal Rastra Bank official website referring to the statistics section. The data extracted includes:

- Inflation rates from Macroeconomic Indicators section.

- Interest rates of Nepali banks, reported from NRB and bankbyaj. com.

- Remittance inflows and trade balances obtained by NRB’s income statements.

- Other economic indicators GDP growth, money supply and foreign exchange reserves

The biggest challenge was looking for datasets till oct 2023. For earlier fiscal years, this was not always the case, and missing values, incomplete data, or requiring corroboration from other data sources. Moreover, several PDFs included tabular information that were not readily extractable so we had to use PDF-to-Excel converter tools and data scrapping techniques.
## Data Processing
After data gathering, the next step was processing the raw data to ensure its consistency and usability. In order to ease the burden of Pandas handling, the datasets have been reshaped into structured formats such as CSV and JSON. Some of the important processing steps included:

- Extracting tables from pdfs using libraries like as pdfplumber Tabula.

- This is a folk [whip] cloning of your Excel files into CSV.

- Getting datasets into a standard date format, such as aligning with fiscal years or months for the purpose of      time-series analysis (Cares, 2023b).

## Data Cleaning
Numerous anomalies, including missing values, duplicate entries, and improper data formats, were present in the raw datasets.  The following methods were used to clean the data:

- In order to handle missing data, rows that included incomplete fiscal year data were either eliminated using `dropna()`.
- Normalizing numeric fields: To ensure consistency in the units of measurement (e.g., billions versus millions), numeric values such as trade balances and remittance inflows were normalized (GeeksforGeeks, 2025d)
- Eliminating duplicates: `drop_duplicates()` was utilized for the removal of duplicate records.
- Data type conversion: For the purpose of having correct data types for numerical fields such as interest rates being cast to float values, the `astype()` function was utilized (Cares, 2023b).

## Data Transformation
To make the data more insightful, several transformation techniques were applied:
- Year-on-Year (YoY) percentage changes: YoY growth trends in inflation, remittance inflows, and trade balances were estimated by the following formula:
Year-over-Year Change = ((Current Year Value - Previous Year Value) / Previous Year Value) * 100
- Moving Averages: Moving averages (3-month and 12-month) were calculated for inflation and remittance figures in order to smooth out the short-term volatility.
- Indexing to Base Year: Economic indicators were converted into indices by selecting a base year and scaling (GeeksforGeeks, 2019)

### Conclusion
This process facilitated the in-depth analysis of the collection, processing, and transformation of the economic indicators of Nepal. Nepal Rastra Bank and other associated financial resources provided the datasets. Some of the challenges encountered included missing values, unstructured forms, and the need for the manual extraction of the data. Python and Pandas were applied in the process of the cleaning and normalization of the data in order to overcome them.
Raw data was converted into analysis-friendly format appropriate for time-series analysis by using multiple processing techniques. Greater understanding about the economics of Nepal was made possible by the year-to-year changes and calculated moving averages. More decision-making and further economic analysis are now possible based on the data due to this process. Improvements in the future may include real-time updating of the datasets by using automation with site scraping and API-based scraping.



<!-- In this assignment, I have collect data of Nepal Rastra Bank. I have collect bank data like Interest Rate, Inflation data, Remitance data, Trade balance and other economic data of Nepal Ratra Bank. While displaying data, many NAN data is shown. I have handle data by using dropna() concept for handling data. SOme datas are changes to csv where as some data are print in excel. 
head() is used to display first data from the given table. Index are given to print data from the given row and column. Table data are Normalize to convert higher value in static form.astype() method are used to return new dataframe.

I have collect data from Official Nepal Rastra Bank website(Publication and Statistical data). For Interest rate, data are collected from bankbyaj.com. Data Normalization,and year of year changes are challenging part in this assignment.

For Cleaning data, dropna() concept is used. -->


In [7]:
## Inflation rate data

import pandas as pd
from IPython.display import display


file_path = "CPI_Inflation_Data.xlsx"  

CPI = pd.read_excel(('data/CPI_Inflation_Data.xlsx') , sheet_name="Sheet1")  



CPI.to_csv("data/CPI_Inflation_Data.csv", index=False)  
display (CPI)

Unnamed: 0,Year,CPI Inflation (%)
0,2017,4.5
1,2018,4.2
2,2019,4.1
3,2020,5.6
4,2021,6.2
5,2022,7.1
6,2023,6.8


In [8]:
# Interest Rate data
import pandas as pd
from IPython.display import display

file_path = "Interest_Rates_Data.xlsx"  
Interest_Rate = pd.read_excel(('data/Interest_Rates_Data.xlsx') , sheet_name="Sheet1")  


Interest_Rate.to_csv("data/Interest_Rates_Data.csv", index=False)  
display (Interest_Rate)

Unnamed: 0,Year,Repo Rate (%),Reverse Repo Rate (%)
0,2017,5.0,5.5
1,2018,5.5,6.0
2,2019,5.5,6.0
3,2020,4.0,4.5
4,2021,3.0,3.5
5,2022,2.5,3.0
6,2023,2.5,3.0


In [9]:
#Economic data
import pandas as pd
from IPython.display import display

file_path = "GDP_Growth_Data.xlsx" 

economic_data = pd.read_excel(('data/GDP_Growth_Data.xlsx') , sheet_name="Sheet1")  


economic_data.to_csv("data/GDP_Growth_Data.csv", index=False)  
display (economic_data)

Unnamed: 0,Year,GDP Growth (%)
0,2017,7.5
1,2018,6.3
2,2019,7.1
3,2020,-1.9
4,2021,4.0
5,2022,5.1
6,2023,6.7


In [10]:
# Remitance data
import pandas as pd
from IPython.display import display

file_path = "Remittance_Data.xlsx"  
remitance_data = pd.read_excel(('data/Remittance_Data.xlsx') , sheet_name="Sheet1")  

remitance_data.to_csv("data/Remittance_Data.csv", index=False)  
display (remitance_data)

Unnamed: 0,Year,Remittance Inflow (Rs. billion),Growth (%)
0,2017,699.85,5.2
1,2018,733.61,4.8
2,2019,879.12,19.8
3,2020,961.05,9.3
4,2021,1023.5,6.5
5,2022,1229.82,20.1
6,2023,1304.77,6.1


In [None]:
#Trade Data
import pandas as pd
from IPython.display import display

file_path = "Trade_Data.xlsx"  
trade_data = pd.read_excel(('data/Trade_Data.xlsx') , sheet_name="Sheet1")  

trade_data.to_csv("data/Trade_Data.csv", index=False)  
display (trade_data)

Unnamed: 0,Year,Exports (Rs. billion),Imports (Rs. billion),Trade Deficit (Rs. billion)
0,2017,81.2,984.5,903.3
1,2018,97.1,1229.6,1132.5
2,2019,121.5,1534.7,1413.2
3,2020,144.3,1602.4,1458.1
4,2021,187.2,1804.9,1617.7
5,2022,215.0,2258.6,2043.6
6,2023,234.5,2507.1,2272.6


In [12]:
## Normalization and yoy data of above data
import pandas as pd

# Data preparation for all datasets
remittance_data = pd.DataFrame({
    "Year": [2017, 2018, 2019, 2020, 2021, 2022, 2023],
    "Remittance Inflow (Rs. billion)": [699.85, 733.61, 879.12, 961.05, 1023.50, 1229.82, 1304.77],
    "Growth (%)": [5.2, 4.8, 19.8, 9.3, 6.5, 20.1, 6.1]
})

trade_data = pd.DataFrame({
    "Year": [2017, 2018, 2019, 2020, 2021, 2022, 2023],
    "Exports (Rs. billion)": [81.2, 97.1, 121.5, 144.3, 187.2, 215.0, 234.5],
    "Imports (Rs. billion)": [984.5, 1229.6, 1534.7, 1602.4, 1804.9, 2258.6, 2507.1],
    "Trade Deficit (Rs. billion)": [903.3, 1132.5, 1413.2, 1458.1, 1617.7, 2043.6, 2272.6]
})

cpi_inflation_data = pd.DataFrame({
    "Year": [2017, 2018, 2019, 2020, 2021, 2022, 2023],
    "CPI Inflation (%)": [4.5, 4.2, 4.1, 5.6, 6.2, 7.1, 6.8]
})

interest_rates_data = pd.DataFrame({
    "Year": [2017, 2018, 2019, 2020, 2021, 2022, 2023],
    "Repo Rate (%)": [5.0, 5.5, 5.5, 4.0, 3.0, 2.5, 2.5],
    "Reverse Repo Rate (%)": [5.5, 6.0, 6.0, 4.5, 3.5, 3.0, 3.0]
})

gdp_growth_data = pd.DataFrame({
    "Year": [2017, 2018, 2019, 2020, 2021, 2022, 2023],
    "GDP Growth (%)": [7.5, 6.3, 7.1, -1.9, 4.0, 5.1, 6.7]
})

# Handling missing or inconsistent values for CPI Inflation
cpi_inflation_data.loc[4, 'CPI Inflation (%)'] = pd.NA  # Simulate missing value for 2021
cpi_inflation_data['CPI Inflation (%)'] = cpi_inflation_data['CPI Inflation (%)'].fillna(method='ffill')

# Calculate year-on-year percentage changes and moving averages
cpi_inflation_data['YoY Change (%)'] = cpi_inflation_data['CPI Inflation (%)'].pct_change() * 100
cpi_inflation_data['MA CPI Inflation'] = cpi_inflation_data['CPI Inflation (%)'].rolling(window=2).mean()
remittance_data['YoY Growth Change (%)'] = remittance_data['Growth (%)'].pct_change() * 100

# Display all data
display(remittance_data)
display(trade_data)
display(cpi_inflation_data)
display(interest_rates_data)
display(gdp_growth_data)

  cpi_inflation_data['CPI Inflation (%)'] = cpi_inflation_data['CPI Inflation (%)'].fillna(method='ffill')


Unnamed: 0,Year,Remittance Inflow (Rs. billion),Growth (%),YoY Growth Change (%)
0,2017,699.85,5.2,
1,2018,733.61,4.8,-7.692308
2,2019,879.12,19.8,312.5
3,2020,961.05,9.3,-53.030303
4,2021,1023.5,6.5,-30.107527
5,2022,1229.82,20.1,209.230769
6,2023,1304.77,6.1,-69.651741


Unnamed: 0,Year,Exports (Rs. billion),Imports (Rs. billion),Trade Deficit (Rs. billion)
0,2017,81.2,984.5,903.3
1,2018,97.1,1229.6,1132.5
2,2019,121.5,1534.7,1413.2
3,2020,144.3,1602.4,1458.1
4,2021,187.2,1804.9,1617.7
5,2022,215.0,2258.6,2043.6
6,2023,234.5,2507.1,2272.6


Unnamed: 0,Year,CPI Inflation (%),YoY Change (%),MA CPI Inflation
0,2017,4.5,,
1,2018,4.2,-6.666667,4.35
2,2019,4.1,-2.380952,4.15
3,2020,5.6,36.585366,4.85
4,2021,5.6,0.0,5.6
5,2022,7.1,26.785714,6.35
6,2023,6.8,-4.225352,6.95


Unnamed: 0,Year,Repo Rate (%),Reverse Repo Rate (%)
0,2017,5.0,5.5
1,2018,5.5,6.0
2,2019,5.5,6.0
3,2020,4.0,4.5
4,2021,3.0,3.5
5,2022,2.5,3.0
6,2023,2.5,3.0


Unnamed: 0,Year,GDP Growth (%)
0,2017,7.5
1,2018,6.3
2,2019,7.1
3,2020,-1.9
4,2021,4.0
5,2022,5.1
6,2023,6.7


In [13]:
## merge data
import pandas as pd

# Define the paths to your Excel files
remittance_path = "data/Remittance_Data.xlsx"
trade_path = "data/Trade_Data.xlsx"
interest_rates_path = "data/Interest_Rates_Data.xlsx"
gdp_growth_path = "data/GDP_Growth_Data.xlsx"
cpi_inflation_path = "data/CPI_Inflation_Data.xlsx"

# Load data from each Excel file correctly, selecting specific columns after loading
remittance_data = pd.read_excel(remittance_path, engine='openpyxl')[["Year", "Remittance Inflow (Rs. billion)", "Growth (%)"]]
trade_data = pd.read_excel(trade_path, engine='openpyxl')[["Year", "Exports (Rs. billion)", "Imports (Rs. billion)"]]
interest_rates_data = pd.read_excel(interest_rates_path, engine='openpyxl')[["Year", "Repo Rate (%)", "Reverse Repo Rate (%)"]]
gdp_growth_data = pd.read_excel(gdp_growth_path, engine='openpyxl')[["Year", "GDP Growth (%)"]]
cpi_inflation_data = pd.read_excel(cpi_inflation_path, engine='openpyxl')[["Year", "CPI Inflation (%)"]]

# Merge all datasets on the 'Year' column
from functools import reduce
data_frames = [remittance_data, trade_data, interest_rates_data, gdp_growth_data, cpi_inflation_data]
consolidated_data = reduce(lambda left, right: pd.merge(left, right, on='Year'), data_frames)

# Calculate the trade deficit
consolidated_data['Trade Deficit (Rs. billion)'] = consolidated_data['Imports (Rs. billion)'] - consolidated_data['Exports (Rs. billion)']

# Reorder columns to match your required format
column_order = ["Remittance Inflow (Rs. billion)", "Growth (%)", "Exports (Rs. billion)",
                "Imports (Rs. billion)", "Trade Deficit (Rs. billion)", "Repo Rate (%)",
                "Reverse Repo Rate (%)", "CPI Inflation (%)", "GDP Growth (%)"]
consolidated_data = consolidated_data[['Year'] + column_order]

# Display the consolidated DataFrame
display(consolidated_data)

Unnamed: 0,Year,Remittance Inflow (Rs. billion),Growth (%),Exports (Rs. billion),Imports (Rs. billion),Trade Deficit (Rs. billion),Repo Rate (%),Reverse Repo Rate (%),CPI Inflation (%),GDP Growth (%)
0,2017,699.85,5.2,81.2,984.5,903.3,5.0,5.5,4.5,7.5
1,2018,733.61,4.8,97.1,1229.6,1132.5,5.5,6.0,4.2,6.3
2,2019,879.12,19.8,121.5,1534.7,1413.2,5.5,6.0,4.1,7.1
3,2020,961.05,9.3,144.3,1602.4,1458.1,4.0,4.5,5.6,-1.9
4,2021,1023.5,6.5,187.2,1804.9,1617.7,3.0,3.5,6.2,4.0
5,2022,1229.82,20.1,215.0,2258.6,2043.6,2.5,3.0,7.1,5.1
6,2023,1304.77,6.1,234.5,2507.1,2272.6,2.5,3.0,6.8,6.7


### Reference
- Cares, L. (2023). Handy Python pandas for handling missing values - Learner CARES - medium. Medium. https://learner-cares.medium.com/handy-pandas-python-library-for-handling-missing-values-dc5f0d1ebf82#:~:text=The%20first%20step%20in%20handling,DataFrame%20is%20missing%20or%20not.
- GeeksforGeeks. (2025c). Introduction of database normalization. GeeksforGeeks. https://www.geeksforgeeks.org/introduction-of-database-normalization/
- GeeksforGeeks. (2019). Python | Pandas DataFrame.transform. GeeksforGeeks. https://www.geeksforgeeks.org/python-pandas-dataframe-transform/