# MATPLOTLIB Visualizations

In [1]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
%matplotlib notebook

# Bar plot and Box Plot

These plots have been constructed using the National Family Health Survey dataset, 2019-2021[1].

The National Family Health Survey (NFHS) is a large-scale, multi-round survey conducted in a representative sample of households throughout India. The NFHS is a collaborative project of the International Institute for Population Sciences(IIPS), Mumbai, India; ICF, Calverton, Maryland, USA and the East-West Center, Honolulu, Hawaii, USA. The Ministry of Health and Family Welfare (MOHFW), Government of India, designated IIPS as the nodal agency, responsible for providing coordination and technical guidance for the NFHS. NFHS was funded by the United States Agency for International Development (USAID) with supplementary support from United Nations Children's Fund (UNICEF). IIPS collaborated with a number of Field Organizations (FO) for survey implementation. Each FO was responsible for conducting survey activities in one or more states covered by the NFHS. Technical assistance for the NFHS was provided by ICF and the East-West Center. [2]

## References:

1. https://data.gov.in/resource/india-districts-factsheets-national-family-health-survey-nfhs-5-2019-2021
2. http://rchiips.org/nfhs/about.shtml

In [2]:
df = pd.read_excel("NFHS_5_India_Districts_Factsheet_Data.xls")

In [3]:
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 707 entries, 0 to 706
Columns: 109 entries, District Names to Men age 15 years and above who consume alcohol (%)
dtypes: float64(75), int64(3), object(31)
memory usage: 602.2+ KB


In [4]:
df.columns

Index(['District Names', 'State/UT', 'Number of Households surveyed',
       'Number of Women age 15-49 years interviewed',
       'Number of Men age 15-54 years interviewed',
       'Female population age 6 years and above who ever attended school (%)',
       'Population below age 15 years (%)',
       ' Sex ratio of the total population (females per 1,000 males)',
       'Sex ratio at birth for children born in the last five years (females per 1,000 males)',
       'Children under age 5 years whose birth was registered with the civil authority (%)',
       ...
       'Men age 15 years and above wih Mildly elevated blood pressure (Systolic 140-159 mm of Hg and/or Diastolic 90-99 mm of Hg) (%)',
       'Men age 15 years and above wih Moderately or severely elevated blood pressure (Systolic ≥160 mm of Hg and/or Diastolic ≥100 mm of Hg) (%)',
       'Men age 15 years and above wih Elevated blood pressure (Systolic ≥140 mm of Hg and/or Diastolic ≥90 mm of Hg) or taking medicine to contro

In [5]:
df.describe()

Unnamed: 0,Number of Households surveyed,Number of Women age 15-49 years interviewed,Number of Men age 15-54 years interviewed,Female population age 6 years and above who ever attended school (%),Population below age 15 years (%),"Sex ratio of the total population (females per 1,000 males)","Sex ratio at birth for children born in the last five years (females per 1,000 males)",Children under age 5 years whose birth was registered with the civil authority (%),Population living in households with electricity (%),Population living in households with an improved drinking-water source1 (%),...,Men age 15 years and above wih Mildly elevated blood pressure (Systolic 140-159 mm of Hg and/or Diastolic 90-99 mm of Hg) (%),Men age 15 years and above wih Moderately or severely elevated blood pressure (Systolic ≥160 mm of Hg and/or Diastolic ≥100 mm of Hg) (%),Men age 15 years and above wih Elevated blood pressure (Systolic ≥140 mm of Hg and/or Diastolic ≥90 mm of Hg) or taking medicine to control blood pressure (%),Women (age 30-49 years) Ever undergone a screening test for cervical cancer (%),Women (age 30-49 years) Ever undergone a breast examination for breast cancer (%),Women (age 30-49 years) Ever undergone an oral cavity examination for oral cancer (%),Women age 15 years and above who use any kind of tobacco (%),Men age 15 years and above who use any kind of tobacco (%),Women age 15 years and above who consume alcohol (%),Men age 15 years and above who consume alcohol (%)
count,707.0,707.0,707.0,707.0,707.0,707.0,707.0,707.0,707.0,707.0,...,707.0,707.0,707.0,707.0,707.0,707.0,707.0,707.0,707.0,707.0
mean,900.502122,1024.019802,144.002829,71.507822,26.356874,1020.696139,937.990184,91.064795,96.996025,93.726054,...,16.253621,6.042518,24.759661,1.565827,0.6529,0.700495,11.614965,40.599646,2.912631,23.19075
std,69.273371,177.064999,31.953268,10.311666,5.296601,73.367114,165.625452,9.392697,4.354175,8.71469,...,4.336475,2.573082,6.768313,2.774292,1.566614,1.468252,11.943028,14.081028,6.079181,13.36201
min,213.0,216.0,17.0,45.36,15.98,754.98,-1261.45,51.58,68.35,41.18,...,5.29,0.83,10.02,0.0,0.0,0.0,0.06,6.75,0.0,0.07
25%,882.0,911.0,124.0,64.395,22.505,969.055,864.85,87.03,96.39,92.03,...,13.19,4.105,19.825,0.19,0.0,0.0,4.09,30.42,0.27,13.585
50%,908.0,1020.0,145.0,71.34,25.36,1013.26,930.04,94.89,98.65,96.98,...,16.25,5.83,24.42,0.55,0.21,0.28,7.67,42.49,0.5,20.16
75%,931.0,1141.0,164.0,78.97,29.53,1065.53,1014.575,97.745,99.515,99.25,...,18.845,7.555,28.97,1.52,0.5,0.69,14.765,50.97,1.71,30.89
max,990.0,1621.0,241.0,99.16,50.56,1331.64,1484.97,100.0,100.0,100.0,...,32.86,19.46,49.61,23.22,14.55,15.84,70.58,80.56,42.77,68.38


For simplicity, I have taken only the data of Karnataka State.

In [6]:
df_karnataka = df[df["State/UT"]=="Karnataka"].copy()
df_karnataka

Unnamed: 0,District Names,State/UT,Number of Households surveyed,Number of Women age 15-49 years interviewed,Number of Men age 15-54 years interviewed,Female population age 6 years and above who ever attended school (%),Population below age 15 years (%),"Sex ratio of the total population (females per 1,000 males)","Sex ratio at birth for children born in the last five years (females per 1,000 males)",Children under age 5 years whose birth was registered with the civil authority (%),...,Men age 15 years and above wih Mildly elevated blood pressure (Systolic 140-159 mm of Hg and/or Diastolic 90-99 mm of Hg) (%),Men age 15 years and above wih Moderately or severely elevated blood pressure (Systolic ≥160 mm of Hg and/or Diastolic ≥100 mm of Hg) (%),Men age 15 years and above wih Elevated blood pressure (Systolic ≥140 mm of Hg and/or Diastolic ≥90 mm of Hg) or taking medicine to control blood pressure (%),Women (age 30-49 years) Ever undergone a screening test for cervical cancer (%),Women (age 30-49 years) Ever undergone a breast examination for breast cancer (%),Women (age 30-49 years) Ever undergone an oral cavity examination for oral cancer (%),Women age 15 years and above who use any kind of tobacco (%),Men age 15 years and above who use any kind of tobacco (%),Women age 15 years and above who consume alcohol (%),Men age 15 years and above who consume alcohol (%)
251,Belgaum,Karnataka,907,1147,179,72.37,24.99,1031.82,892.03,96.55,...,14.66,4.9,22.22,0.47,0.58,0.77,6.6,28.77,0.45,11.47
252,Bagalkot,Karnataka,881,1138,182,67.29,26.5,1007.4,879.35,97.75,...,13.23,4.94,21.01,0.61,0.15,0.16,10.68,33.69,0.58,14.83
253,Bijapur,Karnataka,887,1091,149,66.11,28.04,994.96,884.69,95.47,...,13.22,5.1,20.95,0.68,0.17,0.37,10.76,34.6,0.78,15.26
254,Bidar,Karnataka,914,1181,172,68.23,27.27,1033.68,897.98,97.89,...,15.69,5.41,24.13,0.44,0.16,0.47,8.53,31.2,0.25,16.59
255,Raichur,Karnataka,891,1177,166,55.27,29.29,1033.4,907.03,88.73,...,10.43,4.33,16.67,0.75,0.36,0.52,13.47,28.42,0.89,15.77
256,Koppal,Karnataka,872,1017,171,63.92,28.76,994.25,951.54,95.07,...,14.21,5.08,22.16,0.49,0.37,0.19,19.87,34.12,1.22,13.0
257,Gadag,Karnataka,892,1136,193,73.3,24.13,1059.81,910.69,99.73,...,14.65,7.1,23.88,0.16,0.0,0.53,8.5,35.56,0.54,13.71
258,Dharwad,Karnataka,864,1051,156,78.22,22.09,1021.94,1109.57,99.67,...,15.01,6.09,23.69,0.29,0.14,0.0,6.83,36.79,0.46,15.77
259,Uttara Kannada,Karnataka,897,1000,164,81.14,18.16,1011.34,723.92,99.2,...,14.71,6.8,25.11,0.0,0.0,0.0,10.5,26.12,0.29,12.27
260,Haveri,Karnataka,875,1060,141,72.55,25.43,977.51,805.46,98.55,...,14.35,6.86,24.42,0.17,0.0,0.16,9.15,33.07,1.01,12.09


In [7]:
df_karnataka.describe()

Unnamed: 0,Number of Households surveyed,Number of Women age 15-49 years interviewed,Number of Men age 15-54 years interviewed,Female population age 6 years and above who ever attended school (%),Population below age 15 years (%),"Sex ratio of the total population (females per 1,000 males)","Sex ratio at birth for children born in the last five years (females per 1,000 males)",Children under age 5 years whose birth was registered with the civil authority (%),Population living in households with electricity (%),Population living in households with an improved drinking-water source1 (%),...,Men age 15 years and above wih Mildly elevated blood pressure (Systolic 140-159 mm of Hg and/or Diastolic 90-99 mm of Hg) (%),Men age 15 years and above wih Moderately or severely elevated blood pressure (Systolic ≥160 mm of Hg and/or Diastolic ≥100 mm of Hg) (%),Men age 15 years and above wih Elevated blood pressure (Systolic ≥140 mm of Hg and/or Diastolic ≥90 mm of Hg) or taking medicine to control blood pressure (%),Women (age 30-49 years) Ever undergone a screening test for cervical cancer (%),Women (age 30-49 years) Ever undergone a breast examination for breast cancer (%),Women (age 30-49 years) Ever undergone an oral cavity examination for oral cancer (%),Women age 15 years and above who use any kind of tobacco (%),Men age 15 years and above who use any kind of tobacco (%),Women age 15 years and above who consume alcohol (%),Men age 15 years and above who consume alcohol (%)
count,30.0,30.0,30.0,30.0,30.0,30.0,30.0,30.0,30.0,30.0,...,30.0,30.0,30.0,30.0,30.0,30.0,30.0,30.0,30.0,30.0
mean,885.8,1015.166667,150.533333,71.601667,22.795,1041.493333,977.763,97.866333,99.151,94.943,...,17.035667,6.886333,26.875333,0.574667,0.304333,0.415667,9.673333,27.721667,0.974,16.942
std,19.35227,111.37482,19.591929,7.974482,3.465737,47.652405,130.592454,2.419375,0.492533,4.50384,...,3.130441,1.606537,4.758878,0.887514,0.265884,0.384113,4.090467,6.1407,0.500245,4.29228
min,837.0,814.0,108.0,51.88,18.06,967.27,723.92,88.73,97.59,81.48,...,10.43,4.33,16.67,0.0,0.0,0.0,2.78,15.09,0.25,11.47
25%,875.75,951.5,136.0,66.405,20.2875,1010.62,886.525,97.0675,98.835,94.11,...,14.6525,5.425,23.7375,0.19,0.15,0.16,6.8675,22.4975,0.5875,13.8075
50%,888.5,995.0,147.0,72.385,21.84,1032.61,952.35,98.5,99.205,95.825,...,17.825,6.98,28.3,0.38,0.26,0.365,9.47,28.595,0.925,16.18
75%,901.0,1100.0,165.5,75.8825,25.32,1063.9725,1105.4675,99.5675,99.4775,98.2675,...,19.2425,8.0925,29.8975,0.64,0.3675,0.565,11.1925,32.88,1.215,18.99
max,914.0,1242.0,193.0,85.72,29.29,1168.41,1189.95,100.0,99.88,99.89,...,22.32,9.41,33.99,4.95,0.97,1.75,19.87,36.94,2.27,27.88


# Bar Plot

**Fig.1** % of female population who attended school (6 years and above) in Karnataka

In [8]:
plt.figure(figsize=(14,15))
plt.xticks(rotation=90)
plt.bar(df_karnataka["District Names"],df_karnataka["Female population age 6 years and above who ever attended school (%)"])
plt.xlabel("District Names")
plt.ylabel("Female population age 6 years and above who ever attended school (%)")
plt.savefig("./barplot.png")

<IPython.core.display.Javascript object>

# Conclusion

In Bangalore, the most no. of females attended school, while in Yadgir, the least no. of females attended school.

# Box Plot

**Fig. 2** Sex Ratio distribution throughout India
1. Sex ratio at birth for children born in the last five years (females per 1,000 males)
2. Sex ratio of the total population (females per 1,000 males)

In [9]:
plt.figure(figsize=(8,8))
box = plt.boxplot([df["Sex ratio at birth for children born in the last five years (females per 1,000 males)"],df[" Sex ratio of the total population (females per 1,000 males)"]],patch_artist=True,labels=["Sex ratio at birth for children born in the last five years"," Sex ratio of the total population"])
plt.xticks()
colors = ["cyan","cyan"]

for patch, color in zip(box['boxes'], colors):
    patch.set_facecolor(color)

plt.savefig("./boxplot.png")

<IPython.core.display.Javascript object>

We can observe that the sex ratio is also taking negative values.

![image.png](attachment:image.png)

From this figure, we can see that the negative values are actually written in the excel sheet within brackets. According to [1], if the written values are within brackets (), then it means that the ratio or percentage or average was computed using between 25 to 49 sample values. What this means in this figure is that in the districts, which have negative (or bracketed) sex ratio values, the population of men examined or surveyed is between 25 to 50, which essentially means that this value is not a true representative of the population being observed. 

In the excel sheet, we can observe that in Bhopal, only 38 males were surveyed (15-54 years), which indeed matches with this observation. 

To handle this situation, we can remove the values within parentheses (the negative values) using `df.drop()` function or we can keep those values, taking the absolute value. I have retained those values, by taking absolute values in the coming code snippet.

So, in the next figure, the box plot is the corrected version, after doing this "data preprocessing".

## References:
1. https://www.dhsprogram.com/pubs/pdf/DM56/DM56.pdf

In [10]:
plt.figure(figsize=(8,8))
box = plt.boxplot([abs(df["Sex ratio at birth for children born in the last five years (females per 1,000 males)"]),df[" Sex ratio of the total population (females per 1,000 males)"]],patch_artist=True,labels=["Sex ratio at birth for children born in the last five years"," Sex ratio of the total population"])
plt.xticks()
plt.ylabel("Sex Ratio")
colors = ["cyan","cyan"]

for patch, color in zip(box['boxes'], colors):
    patch.set_facecolor(color)

plt.savefig("./boxplot_corrected.png")

<IPython.core.display.Javascript object>

# Conclusion

We can observe that the sex ratio of the new born children is less than that of the total population. This means, sex ratio is likely to decrease in the future. Also, there are still households, which prefer having male children compared to female children.

# Scatter Plot

The dataset used for this plot is the Consumer Price Index dataset. [1]

Consumer Price Index (CPI) measures changes over time in the general level of prices of goods and services that households acquire for the purpose of consumption. For the construction of CPI numbers, two requisite components are weighing diagrams (consumption patterns) and price data collected at regular intervals. The data refers to the All India Consumer Price Index with the base year 2012 equal 100 and combined for both rural and urban areas. All India Consumer Price Index (CPI) - Cereals and products have increased from 144.9 in January 2021 to 148.2 in November 2021 showing an increase of 3.3 points. All India Consumer Price Index (CPI) - Vegetables has increased from 194.2 in January 2021 to 199.2 in November 2021showing an increase of 5 points. All India Consumer Price Index (CPI) - Fruits has increased from 149.6 in January 2021 to 156.5 in November 2021 showing an increase of 6.9 points. All India Consumer Price Index (CPI) - General index has increased from 157.3 in January 2021 to 166.7 in November 2021 showing an increase of 9.4 points. Consumer Price Index (CPI) for the rural area, the General index has increased from 158.5 in January 2021 to 167.6 in November 2021 showing an increase of 9.1 points. Consumer Price Index (CPI) for the urban area, General index has increased from 156 in January 2021 to 165.6 in November 2021 showing an increase of 9.6 points. [2]

## References:

1. https://data.gov.in/resource/all-india-consumer-price-index-ruralurban-upto-november-2021
2. https://visualize.data.gov.in/?inst=a5df75bc-4578-48ad-bc9d-e6eb4b63de0a&vid=106532

### Note that I have plotted only from Jan 2013 to Apr 2016 for simplicity

In [11]:
df = pd.read_csv("./All_India_Index_july2019_20Aug2020_dec20_1_2.csv")

In [12]:
plt.figure(figsize=(15,15))
plt.xticks(rotation=90)
df_copy=df[df["Sector"]=="Rural+Urban"][:40].copy()
x=list(map(lambda ls:str(ls[0])+' '+str(ls[1]),list(zip(df_copy[df_copy.columns[1]],df_copy[df_copy.columns[2]]))))
plt.scatter(x=x,y=df_copy["Cereals and products"],label="Cereals and Products")
plt.scatter(x=x,y=df_copy["Fruits"],label="Fruits")
plt.scatter(x=x,y=df_copy["Vegetables"],label="Vegetables")
plt.legend()
plt.xlabel("Month and Year")
plt.ylabel("Customer Price Index")
plt.savefig("./scatter.png")

<IPython.core.display.Javascript object>

In [13]:
for t in list(zip(df_karnataka["District Names"],df_karnataka["Female population age 6 years and above who ever attended school (%)"])):
    print(t)


('Belgaum ', 72.37)
('Bagalkot ', 67.29)
('Bijapur ', 66.11)
('Bidar ', 68.23)
('Raichur ', 55.27)
('Koppal ', 63.92)
('Gadag ', 73.3)
('Dharwad ', 78.22)
('Uttara Kannada ', 81.14)
('Haveri ', 72.55)
('Bellary ', 64.17)
('Chitradurga ', 72.95)
('Davanagere ', 74.9)
('Shimoga ', 78.77)
('Udupi ', 81.99)
('Chikmagalur ', 76.21)
('Tumkur ', 72.34)
('Bangalore ', 85.72)
('Mandya ', 67.69)
('Hassan ', 74.38)
('Dakshina Kannada ', 83.89)
('Kodagu ', 83.63)
('Mysore ', 72.4)
('Chamarajanagar ', 63.19)
('Gulbarga ', 64.95)
('Yadgir ', 51.88)
('Kolar ', 71.64)
('Chikkaballapura ', 65.87)
('Bangalore Rural ', 74.25)
('Ramanagara ', 68.83)


# Conclusion

We can observe that there is an increasing trend in the price of all the commodities. However, the prices of vegetables exhibits large fluctuations.