**Program Goal: To rearrange the given input data file (CRPI-Main.csv) and write the rearranged output file (CRPI-Mod.csv) so that the resulting output file can be used effectively by other programs in this project**

**1) Import the required Python Packages/ Libraries**

In [1]:
#Import required python packages
import pandas as pd
import warnings
warnings.filterwarnings("ignore")

**2) Read the Data file and check**

In [20]:
# Read the input data file and save it in a Pandas dataframe format 
df = pd.read_csv('D:/CRPI-Latest1/Data-Files/CRPI-Main.csv')

In [21]:
# Check the number of rows and columns of the input file dataframe
df.shape

(152, 13)

In [4]:
# Display the first 5 records of the input file Dataframe
df.head()

Unnamed: 0,Year,City,Population (in Lakhs) (2011)+,Murder,Kidnapping,Crime against women,Crime against children,Crime Committed by Juveniles,Crime against Senior Citizen,Crime against SC,Crime against ST,Economic Offences,Cyber Crimes
0,2014,Ahmedabad,63.5,82,367,1371,437,215,68,66,6,399,32
1,2015,Ahmedabad,63.5,94,332,1067,609,157,17,60,9,378,28
2,2016,Ahmedabad,63.5,103,376,1126,481,258,362,96,10,479,77
3,2017,Ahmedabad,63.5,90,263,1405,600,405,534,119,6,608,112
4,2018,Ahmedabad,63.5,98,277,1416,733,352,733,145,9,842,212


In [5]:
# Display the last 5 records of the input file Dataframe
df.tail()

Unnamed: 0,Year,City,Population (in Lakhs) (2011)+,Murder,Kidnapping,Crime against women,Crime against children,Crime Committed by Juveniles,Crime against Senior Citizen,Crime against SC,Crime against ST,Economic Offences,Cyber Crimes
147,2017,Surat,45.8,89,332,559,526,436,131,32,10,719,105
148,2018,Surat,45.8,108,373,712,1075,409,161,29,13,829,155
149,2019,Surat,45.8,97,358,1015,770,516,232,34,19,804,228
150,2020,Surat,45.8,116,163,633,419,298,69,20,12,401,204
151,2021,Surat,45.8,121,270,622,479,355,66,22,19,663,296


In [6]:
# Display the complete information about the the input file Dataframe
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 152 entries, 0 to 151
Data columns (total 13 columns):
 #   Column                         Non-Null Count  Dtype  
---  ------                         --------------  -----  
 0   Year                           152 non-null    int64  
 1   City                           152 non-null    object 
 2   Population (in Lakhs) (2011)+  152 non-null    float64
 3   Murder                         152 non-null    int64  
 4   Kidnapping                     152 non-null    int64  
 5   Crime against women            152 non-null    int64  
 6   Crime against children         152 non-null    int64  
 7   Crime Committed by Juveniles   152 non-null    int64  
 8   Crime against Senior Citizen   152 non-null    int64  
 9   Crime against SC               152 non-null    int64  
 10  Crime against ST               152 non-null    int64  
 11  Economic Offences              152 non-null    int64  
 12  Cyber Crimes                   152 non-null    int

In [7]:
# Checking for columnwise missing data
df.isnull().sum()

Year                             0
City                             0
Population (in Lakhs) (2011)+    0
Murder                           0
Kidnapping                       0
Crime against women              0
Crime against children           0
Crime Committed by Juveniles     0
Crime against Senior Citizen     0
Crime against SC                 0
Crime against ST                 0
Economic Offences                0
Cyber Crimes                     0
dtype: int64

**Note: There is no missing values in the given input data file**

In [8]:
# Checking and removing Duplicate records, if any
df.shape

(152, 13)

In [9]:
# Checking and removing Duplciate records, if any
df.drop_duplicates(inplace = True)

In [10]:
# Checking and removing Duplicate records, if any
df.shape

(152, 13)

**Note: The total number of rows of data before and after executing duplicate records removal command are same. Hence there are no duplciate records in the given input file**

**4) Rearrange the Input file data and columns**

In [11]:
# Rearranging the data and columns in the input file dataframe so that
# it will be easy to use it in the ML Model
new_df = pd.DataFrame(columns=['Year', 'City', 'Population (in Lakhs) (2011)+', 'Number Of Cases', 'Type'])
for i in range(3, 13):
    temp = df[['Year', 'City', 'Population (in Lakhs) (2011)+']].copy()
    temp['Number Of Cases'] = df[[df.columns[i]]]
    temp['Type'] = df.columns[i]
    
    new_df = pd.concat([new_df, temp])

In [12]:
# Rearranging the data and columns in the input file dataframe so that
# it will be easy to use it in the ML Model
new_df

Unnamed: 0,Year,City,Population (in Lakhs) (2011)+,Number Of Cases,Type
0,2014,Ahmedabad,63.5,82,Murder
1,2015,Ahmedabad,63.5,94,Murder
2,2016,Ahmedabad,63.5,103,Murder
3,2017,Ahmedabad,63.5,90,Murder
4,2018,Ahmedabad,63.5,98,Murder
...,...,...,...,...,...
147,2017,Surat,45.8,105,Cyber Crimes
148,2018,Surat,45.8,155,Cyber Crimes
149,2019,Surat,45.8,228,Cyber Crimes
150,2020,Surat,45.8,204,Cyber Crimes


In [13]:
# Rearranging the data and columns in the input file dataframe so that
# it will be easy to use it in the ML Model
new_df['Crime Rate'] = new_df['Number Of Cases'] / new_df['Population (in Lakhs) (2011)+']

In [14]:
# Rearranging the data and columns in the input file dataframe so that
# it will be easy to use it in the ML Model
new_df.info()

<class 'pandas.core.frame.DataFrame'>
Index: 1520 entries, 0 to 151
Data columns (total 6 columns):
 #   Column                         Non-Null Count  Dtype  
---  ------                         --------------  -----  
 0   Year                           1520 non-null   object 
 1   City                           1520 non-null   object 
 2   Population (in Lakhs) (2011)+  1520 non-null   float64
 3   Number Of Cases                1520 non-null   object 
 4   Type                           1520 non-null   object 
 5   Crime Rate                     1520 non-null   object 
dtypes: float64(1), object(5)
memory usage: 83.1+ KB


In [15]:
# Rearranging the data and columns in the input file dataframe so that
# it will be easy to use it in the ML Model
new_df

Unnamed: 0,Year,City,Population (in Lakhs) (2011)+,Number Of Cases,Type,Crime Rate
0,2014,Ahmedabad,63.5,82,Murder,1.291339
1,2015,Ahmedabad,63.5,94,Murder,1.480315
2,2016,Ahmedabad,63.5,103,Murder,1.622047
3,2017,Ahmedabad,63.5,90,Murder,1.417323
4,2018,Ahmedabad,63.5,98,Murder,1.543307
...,...,...,...,...,...,...
147,2017,Surat,45.8,105,Cyber Crimes,2.292576
148,2018,Surat,45.8,155,Cyber Crimes,3.384279
149,2019,Surat,45.8,228,Cyber Crimes,4.978166
150,2020,Surat,45.8,204,Cyber Crimes,4.454148


In [16]:
# Rearranging the data and columns in the input file dataframe so that
# it will be easy to use it in the ML Model
new_df = new_df.drop(['Number Of Cases'], axis=1)

In [17]:
# Rearranging the data and columns in the input file dataframe so that
# it will be easy to use it in the ML Model
new_df

Unnamed: 0,Year,City,Population (in Lakhs) (2011)+,Type,Crime Rate
0,2014,Ahmedabad,63.5,Murder,1.291339
1,2015,Ahmedabad,63.5,Murder,1.480315
2,2016,Ahmedabad,63.5,Murder,1.622047
3,2017,Ahmedabad,63.5,Murder,1.417323
4,2018,Ahmedabad,63.5,Murder,1.543307
...,...,...,...,...,...
147,2017,Surat,45.8,Cyber Crimes,2.292576
148,2018,Surat,45.8,Cyber Crimes,3.384279
149,2019,Surat,45.8,Cyber Crimes,4.978166
150,2020,Surat,45.8,Cyber Crimes,4.454148


**5) Write the rearranged input data file into the 'Data-Files' folder**

In [18]:
# Writing the rearranged file into "Data-files" directory 
# for the usage of other programs
new_df.to_csv("D:/CRPI-Latest1/Data-Files/CRPI-Mod.csv", index=False)