In [3]:
# import libraries
import numpy as np
import math
import pandas as pd

In [1]:
# Cloning the repo. It has more information about the data.
!git clone https://github.com/raam93/districts_in_india.git

Cloning into 'districts_in_india'...
remote: Counting objects: 10, done.[K
remote: Compressing objects: 100% (9/9), done.[K
remote: Total 10 (delta 1), reused 0 (delta 0), pack-reused 0[K
Unpacking objects: 100% (10/10), done.
Checking connectivity... done.


# Reading Data

'datafile_old.csv' is the data file from https://data.gov.in/resources/district-wise-availability-health-centres-india-31st-march-2017 which contains information about district-wise availability of health centres in india as on 31st March, 2017

In [4]:
dist_old = pd.read_csv('districts_in_india/datafile_old.csv', delimiter= " ")
dist_old.head()

Unnamed: 0,S No.,States/Union Territory,Name of the District,Sub Centres,PHCs,CHCs,Sub Divisional Hospital,District Hospital
0,1,Andhra Pradesh,Srikakulam,465,80,16,2,0.0
1,1,Andhra Pradesh,Vizianagaram,431,68,11,1,1.0
2,1,Andhra Pradesh,Visakhapatnam,583,89,11,2,0.0
3,1,Andhra Pradesh,East Godavari,840,128,26,3,1.0
4,1,Andhra Pradesh,West Godavari,635,91,14,3,1.0


'datafile_new.csv' is the data file from https://www.askbankifsccode.com/blog/list-of-all-states-union-territories-and-districts-in-india/ which claims to have information about all districts and states as on March, 2018. The information in this file is also cross verified with the offficial data (could not find in tabular format) from Government of India web directory http://www.goidirectory.gov.in/district_categories1.php?ou=TN. 

In [5]:
dist_new = pd.read_csv('districts_in_india/datafile_new.csv')
dist_new.head()

Unnamed: 0,State,District,State Type,Unnamed: 3,Unnamed: 4,Last Updated: 28-Mar-2018,Unnamed: 6
0,Andaman Nicobar,Nicobar,Union Territory,,,For Updated Visit:,www.askbankifsccode.com
1,Andaman Nicobar,North Middle Andaman,Union Territory,,,,
2,Andaman Nicobar,South Andaman,Union Territory,,,,
3,Andhra Pradesh,Anantapur,State,,,,
4,Andhra Pradesh,Chittoor,State,,,,


# Processing Data

Each of the above dataframe is grouped based on the state and the total no. of **unique** districts in each state is found.

In [7]:
# Subtracting -1 from the count because the old file has an unnecessary record 'Total' for your analysis 
old_data = dist_old.groupby('States/Union Territory')['Name of the District'].nunique()-1
new_data = dist_new.groupby('State')['District'].nunique()

In [8]:
# Storing the grouped data locally to check for the presence of any anamoly. 
# There will only be a maximum of 36 rows in each file. So this shouldn't take much time
old_data.to_csv('old_group.csv')
new_data.to_csv('new_group.csv')

In [9]:
old_group = pd.read_csv('old_group.csv', delimiter= ",", header=None)
new_group = pd.read_csv('new_group.csv', delimiter= ",", header=None)

In [11]:
# Clearing anomalies
old_group.columns = ['state', 'no_of_districts_2017']
new_group.columns = ['state', 'no_of_districts_2018']
old_group = old_group[old_group.state != 'All India']
old_group['state'].replace({'A& N Islands':'Andaman Nicobar', 'D & N Haveli':'Dadra Nagar Haveli', 'Daman & Diu':'Daman Diu', 
                                  'Jammu & Kashmir':'Jammu Kashmir','colored':'bad'}, inplace=True)
old_group['no_of_districts_2017'].replace(0,1,inplace=True)

A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
  self._update_inplace(new_data)


In [17]:
# total no. of districts in India as of March, 2018.
new_group['no_of_districts_2018'].sum()

718

In [14]:
# Getting records with unequal no. of districts
# join dataframes
grouped_data = pd.merge(old_group, new_group, on='state')
states_with_new_districts = grouped_data[grouped_data['no_of_districts_2017'] != grouped_data['no_of_districts_2018']]
states_with_new_districts

Unnamed: 0,state,no_of_districts_2017,no_of_districts_2018
2,Arunachal Pradesh,20,21
3,Assam,27,33
12,Haryana,21,22
20,Maharashtra,34,36
21,Manipur,9,16
23,Mizoram,9,8
35,West Bengal,19,23


In [15]:
states_with_new_districts_list = states_with_new_districts['state'].tolist()
states_with_new_districts_list

['Arunachal Pradesh',
 'Assam',
 'Haryana',
 'Maharashtra',
 'Manipur',
 'Mizoram',
 'West Bengal']

In [16]:
# Finding new districts
def Difference_bn_lists(list1, list2):
    list_dif = [elem for elem in list1 + list2 if elem not in list1 or elem not in list2]
    return list_dif

for state in states_with_new_districts_list:
  districts_2017 = dist_old[dist_old['States/Union Territory'] == state]['Name of the District'].tolist()
  try:
    districts_2017.remove('Total')
  except ValueError:
    pass
  
  districts_2018 = dist_new[dist_new['State'] == state]['District'].tolist()
  
  if len(districts_2018) > len(districts_2017):
    #new_districts = list(set(districts_2018) - set(districts_2017))
    new_districts = Difference_bn_lists(districts_2018, districts_2017)
    
    print('New districts in ' + state + ' are:')
    print(new_districts)
    print("")
  else:
    print('something is different with ' + state)
    #new_districts = list(set(districts_2017) - set(districts_2018))
    new_districts = Difference_bn_lists(districts_2018, districts_2017)
    
    print(new_districts)
    print("")

New districts in Arunachal Pradesh are:
['Central Siang', 'Lower Siang', 'Siang']

New districts in Assam are:
['Biswanath', 'Charaideo', 'Hojai', 'Kamrup', 'Kamrup Metropolitan', 'Majuli', 'South Salmara-Mankachar', 'West Karbi Anglong', 'Kamrup (Metro)', 'Kamrup (Rural)']

New districts in Haryana are:
['Charkhi Dadri', 'Gurugram', 'Mahendragarh', 'Mewat', 'Gurugram (old name Gurgaon)', 'Mahendragarh (Narnaul)', 'Nuh (old name Mewat)']

New districts in Maharashtra are:
['Mumbai City', 'Mumbai Suburban', 'Sindhudurg', 'Sindhudurga']

New districts in Manipur are:
['Jiribam', 'Kakching', 'Kamjong', 'Kangpokpi', 'Noney', 'Pherzawl', 'Tengnoupal']

something is different with Mizoram
['Aizawl', 'Champhai', 'Kolasib', 'Lawngtlai', 'Lunglei', 'Mamit', 'Saiha', 'Serchhip', 'Aizawl East District', 'Aizawl West District', 'Champhai District', 'Kolasib District', 'Lawngtlai District', 'Lunglei District', 'Mamit District', 'Saiha District', 'Serchhip District']

New districts in West Bengal ar

# Interpretation of results:
In this analysis, the date of creation of a district is assumed as that date mentioned in the respective Government's Gazette Notification (indicating the formation of that district).

## States with new districts:

-  **Arunachal Pradesh:** 
    -  ***Lower Siang*** : Created on March 3, 2014 (https://rbi.org.in/scripts/FS_Notification.aspx?Id=10980&fn=2754&Mode=0). However, became operational only in September, 2017 (https://www.telegraphindia.com/1170923/jsp/northeast/story_174524.jsp)  
    -  ***Siang***: It seems that Siang and Central Siang are used interchangeably​.


-  **Assam:**
    Assam gets *8 new districts* in January, February and August 2016. (https://www.rbi.org.in/scripts/FS_Notification.aspx?Id=11207&fn=2754&Mode=0)
    

-  **Haryana:** 
    -  ***Charkhi Dadri:*** Formed as new district from municipality on December 1, 2016 ( https://rbidocs.rbi.org.in/rdocs/notification/PDFs/NOTI29275FA2A1EB189410E95382B00CAB00CB7.PDF) 
    -  ***Mahendragarh*** is an old district missing in the old data of 2017. ***Mewat*** is a old district renamed as ***Nuh*** in 2016.
    
    
-  **Manipur:**
    All *7 new districts* in Manipur are created on December 8, 2016 (http://manipur.gov.in/wp-content/uploads/2016/12/creation-of-7-district-1-2.pdf,  http://www.imphaltimes.com/news/item/7397-7-new-districts-including-jiribam-district-created-amidst-chaos)
    

## States whose old districts were erroneously recorded:

-  **Maharastra:** 
    ***Mumbai*** and ***Mumbai sub urban*** seems to be missing in old data. And, ***Sindhudurg*** is erroneously recorded as ***Sindhudurga*** in old data


-  **Mizhoram:**
    East, west split of district ***Aizawl*** in the old data seems wrong. Several websites & google search confirm that Aizawl is one district.


-  **West bengal:**
    ***Cooch Behar*** and ***Koch Bihar*** are same old district. ***Darjeeling***, ***Hooghly***, ***Howrah*** are spelled differently. ***Jhargram***, ***Kalimpon***, ***Kolkata*** are old districts, missing in old data. Similarly, other districts are either misspelled or missed in old data. No new districts in West Bengal.

So, based on the above analysis, there are **718** districts in India currently. And the latest 5 districts were created in **Manipur** on **December 8, 2016**.