# $$\textrm{Loksabha data (2004-2019)}$$

# Table of contents

* [Import Libraries](#T1)
* [Import Dataset](#T2)
* [Exploratory Analysis](#T3)
* [Relational Graphs](#T4)
* [Education Bar plots](#T5)
* [Data sorted by criminal records](#T6)
* [City with highest crimes](#T7)
* [Candidate with most cases](#T8)

<a id='T1'></a>
# Import libraries

In [None]:
import numpy as np # linear algebra
import pandas as pd # data processing, CSV file I/O (e.g. pd.read_csv)
import matplotlib.pyplot as plt
import seaborn as sns

%matplotlib inline

import os
for dirname, _, filenames in os.walk('/kaggle/input'):
    for filename in filenames:
        print(os.path.join(dirname, filename))

<a id='T2'></a>
# Import the dataset

This notebook uses 4 csv files (2004, 2009, 2014, 2019)

In [None]:
df_2004 = pd.read_csv("../input/lok-sabha-election-candidate-list-2004-to-2019/LokSabha2004.csv")
df_2009 = pd.read_csv("../input/lok-sabha-election-candidate-list-2004-to-2019/LokSabha2009.csv")
df_2014 = pd.read_csv("../input/lok-sabha-election-candidate-list-2004-to-2019/LokSabha2014.csv")
df_2019 = pd.read_csv("../input/lok-sabha-election-candidate-list-2004-to-2019/LokSabha2019.csv")

Create a year stamp on each dataset

In [None]:
df_2004["Year"] = 2004
df_2009["Year"] = 2009
df_2014["Year"] = 2014
df_2019["Year"] = 2019
Frames = [df_2004,df_2009,df_2014,df_2019]

Concat the dataset axis=1

In [None]:
Data = pd.concat(Frames)
Data

<a id='T3'></a>
# Exploratory Analysis
To begin this exploratory analysis, import libraries and define functions for plotting the data using matplotlib, seaborn.

**Educational qualification of the candidates**

In [None]:
df1 = pd.DataFrame(Data['Education'].value_counts(normalize=True),
                   index=['Doctorate','Graduate Professional', 'Graduate','Post Graduate','12th Pass','10th Pass','8th Pass','5th Pass','Literate','Not Given','Illiterate','Others'])
plot = df1.plot.pie(subplots=True, autopct='%1.1f%%', figsize=(10, 10))



> More than 25% of politicians are not enough literate to govern the city.


<a id='T4'></a>
# Relational Graphs

In [None]:
sns.pairplot(Data);

Criminal cases for the candidate of age 40-60 is higher.
Criminal cases seems to be higher in 2019

<a id='T5'></a>
# Education bar plot

In [None]:
Data['Education'].value_counts().plot(kind='barh', figsize=(20,10))
plt.xlabel("no. of candidates")
plt.ylabel("qualificaton");

In [None]:
Data.hist(bins = 30, figsize=(15,15), color= 'Blue');

<a id='T6'></a>
# Data sorted by criminal records

In [None]:
df1 = pd.DataFrame(Data.groupby('Party')['Criminal Cases'].nunique())
df1.sort_values(by=['Criminal Cases'], inplace=True)
df1

**Parties with highest no. of criminal records**

In [None]:
df1[df1['Criminal Cases']>=10]

**Parties with least number of criminal record**

In [None]:
df1[df1['Criminal Cases']==1]

<a id='T7'></a>
# City with highest crime

In [None]:
df2 = pd.DataFrame(Data.groupby('City')['Criminal Cases'].nunique())
df2.sort_values(by=['Criminal Cases'], inplace=True)
df2 = df2.tail(10)
df2["City"]=df2.index

In [None]:
dt=df2["Criminal Cases"]
plt.figure(figsize=(12,8))
sns.barplot(y="City",x="Criminal Cases",data=df2)
plt.xlim([5,10])
plt.xlabel("no. of games launched")
li=0.0
for i in range(10):
    plt.text(dt[i],li, dt[i])
    li+=1
plt.title("10 Most frequent launch years")
plt.ylabel("Year of release");

<a id='T8'></a>
# Candidate with most cases

In [None]:
df3 = Data.sort_values(by=['Criminal Cases']).tail(10)
df3 

In [None]:
plt.figure(figsize=(12,8))
sns.catplot(x="Criminal Cases",y="Candidate",data=df3, kind="bar");
plt.ylabel("Candidates");

$$\textrm{If you like the work please upvote :-) }$$
**$$\textrm{Comments are Welcome}$$**