## Background of Problem Statement :

NYC 311's mission is to provide the public with quick and easy access to all New York City government services and information while offering the best customer service. Each day, NYC311 receives thousands of requests related to several hundred types of non-emergency services, including noise complaints, plumbing issues, and illegally parked cars. These requests are received by NYC311 and forwarded to the relevant agencies such as the police, buildings, or transportation. The agency responds to the request, addresses it, and then closes it.

## Problem Objective :

Perform a service request data analysis of New York City 311 calls. You will focus on the data wrangling techniques to understand the pattern in the data and also visualize the major complaint types.
Domain: Customer Service

## Analysis Tasks to be performed:

(Perform a service request data analysis of New York City 311 calls) 

    1. Import a 311 NYC service request.
    2. Read or convert the columns ‘Created Date’ and Closed Date’ to datetime datatype and create a new column ‘Request_Closing_Time’ as the time elapsed between request creation and request closing. (Hint: Explore the package/module datetime)
    3. Provide major insights/patterns that you can offer in a visual format (graphs or tables); at least 4 major conclusions that you can come up with after generic data mining.
    4. Order the complaint types based on the average ‘Request_Closing_Time’, grouping them for different locations.
    5. Perform a statistical test for the following:
**Please note:** For the below statements you need to state the Null and Alternate and then provide a statistical test to accept or reject the Null Hypothesis along with the corresponding ‘p-value’.

        a) Whether the average response time across complaint types is similar or not (overal0l) 
        b) Are the type of complaint or service requested and location related? 
        
## Dataset Description :

Field | Description
--------|------------
Unique Key | (Plain text) - Unique identifier for the complaints
Created Date | (Date and Time) - The date and time on which the complaint is raised
Closed Date | (Date and Time)  - The date and time on which the complaint is closed
Agency	| (Plain text) - Agency code
Agency Name	| (Plain text) - Name of the agency
Complaint Type	| (Plain text) - Type of the complaint
Descriptor	| (Plain text) - Complaint type label (Heating - Heat, Traffic Signal Condition - Controller)
Location Type	| (Plain text) - Type of the location (Residential, Restaurant, Bakery, etc)
Incident Zip	| (Plain text) - Zip code for the location
Incident Address	| (Plain text) - Address of the location
Street Name	| (Plain text) - Name of the street
Cross Street 1	| (Plain text) - Detail of cross street
Cross Street 2	| (Plain text) - Detail of another cross street
Intersection Street 1	| (Plain text) - Detail of intersection street if any
Intersection Street 2	| (Plain text) - Detail of another intersection street if any
Address Type	| (Plain text) - Categorical (Address or Intersection)
City	| (Plain text) - City for the location
Landmark	| (Plain text) - Empty field
Facility Type	| (Plain text) - N/A
Status	| (Plain text) - Categorical (Closed or Pending)
Due Date	| (Date and Time) - Date and time for the pending complaints
Resolution Action Updated Date	| (Date and Time) - Date and time when the resolution was provided
Community Board	| (Plain text) - Categorical field (specifies the community board with its code)
Borough	| (Plain text) - Categorical field (specifies the community board)
X Coordinate	| (State Plane) (Number)
Y Coordinate	| (State Plane) (Number)
Park Facility Name	| (Plain text) - Unspecified
Park Borough	| (Plain text) - Categorical (Unspecified, Queens, Brooklyn etc)
School Name	| (Plain text) - Unspecified
School Number	| (Plain text)  - Unspecified
School Region	| (Plain text)  - Unspecified
School Code	| (Plain text)  - Unspecified
School Phone Number	| (Plain text)  - Unspecified
School Address	| (Plain text)  - Unspecified
School City	| (Plain text)  - Unspecified
School State	| (Plain text)  - Unspecified
School Zip	| (Plain text)  - Unspecified
School Not Found | (Plain text)  - Empty Field
School or Citywide Complaint	| (Plain text)  - Empty Field
Vehicle Type	| (Plain text)  - Empty Field
Taxi Company Borough	| (Plain text)  - Empty Field
Taxi Pick Up Location	| (Plain text)  - Empty Field
Bridge Highway Name	| (Plain text)  - Empty Field
Bridge Highway Direction	| (Plain text)  - Empty Field
Road Ramp	| (Plain text)  - Empty Field
Bridge Highway Segment	| (Plain text)  - Empty Field
Garage Lot Name	| (Plain text)  - Empty Field 
Ferry Direction	| (Plain text)  - Empty Field
Ferry Terminal Name	| (Plain text)  - Empty Field
Latitude	| (Number) - Latitude of the location
Longitude	| (Number) - Longitude of the location
Location	| (Location) - Coordinates (Latitude, Longitude)

In [None]:
# Importing libraries

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt

In [None]:
# Import a 311 NYC service request dataset.

df = pd.read_csv("../input/service-request-analysis/Service_Requests_Analysis.csv", low_memory = False)
df.head()

In [None]:
df.shape

In [None]:
df.info()

In [None]:
df.describe()

In [None]:
# Checking null values

df.isnull().sum()

In [None]:
# Percentage of missing values

(df.isna().sum() / len(df)) * 100

In [None]:
# Dropping the null value columns

df.drop(['School or Citywide Complaint', 'Vehicle Type', 'Taxi Company Borough', 'Taxi Pick Up Location', 'Garage Lot Name'],
           axis = 1, inplace = True)

In [None]:
# Read or convert the columns ‘Created Date’ and Closed Date’ to datetime datatype and create a new column 
# ‘Request_Closing_Time’ as the time elapsed between request creation and request closing. 

df['Created Date'] = pd.to_datetime(df['Created Date'])
df['Closed Date'] = pd.to_datetime(df['Closed Date'])

df['Request_Closing_Time'] = df['Closed Date'] - df['Created Date']

# Converting time closed to minute scale

df['Request_Closing_Time_minutes'] = df['Request_Closing_Time'].dt.seconds/60
df.head(1)

In [None]:
df.Agency.unique()

In [None]:
# Provide major insights/patterns that you can offer in a visual format (graphs or tables); at least 4 major conclusions that 
# you can come up with after generic data mining.

agen_compl = df.groupby('Agency')['Complaint Type'].value_counts().to_frame().rename(columns = {'Complaint Type' : 'Count'}).sort_values('Count', ascending = False)
agen_compl.reset_index(inplace = True)

In [None]:
# Graph

fig = plt.figure(figsize = (10,8))
plt.grid()
plt.barh(agen_compl['Complaint Type'], agen_compl['Count'], label = 'complaint count', color = 'red', edgecolor = 'black', 
            linewidth = 1.8, alpha = 0.8)

for i, v in enumerate(agen_compl['Count']):
    plt.text(v+0.2, i+.1, str(v), color = 'blue', fontweight = 'bold')
    
plt.show()

In [None]:
# Here the maximum complaints are for Blocked Driveway and minimum for Animal in a Park.

In [None]:
desc_compl=df.groupby(['City','Descriptor'])['Complaint Type'].count().to_frame().sort_values('Complaint Type',ascending=False)
desc_compl.reset_index(inplace = True)
desc_compl2 = desc_compl.iloc[:15]

fig = plt.figure(figsize = (8,8))
plt.pie(desc_compl2['Complaint Type'], labels = desc_compl2['Descriptor'], autopct = '%0.1f%%', shadow = True, radius = 1, 
           textprops = {'fontsize':15, 'color':'black'});
plt.show()
desc_compl2

In [None]:
df.groupby(['City', 'Complaint Type'])['Complaint Type'].count().to_frame()

In [None]:
df.groupby(['City','Complaint Type'])['Status'].value_counts().to_frame()

In [None]:
# Pie chart for top 5 cities with maximum complaints.

city_compl=df.groupby('City')['Complaint Type'].count().to_frame().sort_values('Complaint Type',ascending=False).reset_index()
city_compl2 = city_compl.iloc[:15]

fig = plt.figure(figsize = (8,8))
plt.pie(city_compl2['Complaint Type'], labels = city_compl2['City'], autopct = '%0.1f%%', radius = 1, shadow = True, 
           textprops = {'fontsize' : 15, 'color' : 'black'});

plt.show()
city_compl2

In [None]:
df.groupby(['Status'])['Request_Closing_Time_minutes'].mean().to_frame()

In [None]:
# Shows which location type has occoured maximum times

fig = plt.figure(figsize = (20,15))

plt.style.use('seaborn-pastel')
loc_type = ((df['Location Type'].value_counts(normalize = True))*100).to_frame()
plt. grid()
plt.bar(loc_type.index, loc_type['Location Type'], width = 0.4)
plt.xticks(rotation = 65, size = 12)
plt.yticks(range(0,90,4), size = 12)
plt.ylabel("Percentage of Occurences", size = 15);

In [None]:
# Order the complaint types based on the average ‘Request_Closing_Time’, grouping them for different locations.

locations = df[['City', 'Complaint Type', 'Request_Closing_Time', 'Request_Closing_Time_minutes']]
locations.groupby(['City', 'Complaint Type'])['Request_Closing_Time_minutes'].mean().to_frame()

In [None]:
df['Request_Closing_Time_in_Second'] = df['Request_Closing_Time'].apply(lambda x : x.seconds)

In [None]:
# Whether the average response time across complaint types is similar or not (overall)

from scipy.stats import ttest_ind
avg_response_time = {}
unique_complaints = set(df['Complaint Type'].unique())

for i in unique_complaints:
    avg_response_time[i] = df[df['Complaint Type'] == i]['Request_Closing_Time_in_Second'].mean()

In [None]:
for i in unique_complaints:
    for k in set(unique_complaints) - {i,i}:
        print("mean for {} is : {}".format(i, avg_response_time[i]))
        print("mean for {} is : {}".format(i, avg_response_time[k]))
            
        sat1 = list(df[df['Complaint Type'] == i]['Request_Closing_Time_in_Second'].dropna())
        sat2 = list(df[df['Complaint Type'] == k]['Request_Closing_Time_in_Second'].dropna())    
        
        test_statistic, p_value = ttest_ind(sat1, sat2)
        print('{} --> p value for similar average for {}, {}\n\n'.format(round(p_value,3), i, k))