# Working with Data:
### New York City 311 Requests Made in 2021

`311_Service_Requests_2021.csv` contains data on all Service Requests (SR) made to NYC's 311 in the first half of January, 2021. Select fields have been retained, as described below. Full data are available at https://data.cityofnewyork.us/Social-Services/311-Service-Requests-from-2010-to-Present/erm2-nwe9


### This Notebook
Contains code that **will throw errors** while attempting to do basic analysis of the dataset. Working solutions can beb found in `Part 3 -- Exercise Solutions`. Try to get your code working before you check there, though!

### Data Fields:

`Unique Key`: Unique identifier of a Service Request (SR) in the open data set

`Created Date`: Date SR was created

`Closed Date`: Date SR was closed by responding agency

`Agency`: Acronym of responding City Government Agency

`Agency Name`: Full Agency name of responding City Government Agency

`Complaint Type`: This is the first level of a hierarchy identifying the topic of the incident or condition. Complaint Type may have a corresponding Descriptor (below) or may stand alone.

`Descriptor`: This is associated to the Complaint Type, and provides further detail on the incident or condition. Descriptor values are dependent on the Complaint Type, and are not always required in SR.

`Location Type`: Describes the type of location used in the address information

`Incident Zip`: Incident location zip code, provided by geo validation.

`Address Type`: Type of incident location information available.

`City`: City of the incident location provided by geovalidation.

`Landmark`: If the incident location is identified as a Landmark the name of the landmark will display here

`Facility Type`: If available, this field describes the type of city facility associated to the SR

`Status`: Status of SR submitted

`Due Date`: Date when responding agency is expected to update the SR. This is based on the Complaint Type and internal Service Level Agreements (SLAs).

`Resolution Description`: Describes the last action taken on the SR by the responding agency. May describe next or future steps.

`Resolution Action Updated Date`: Date when responding agency last updated the SR.

`BBL`: Borough Block and Lot, provided by geovalidation. Parcel number to identify the location of location of buildings and properties in NYC.

`Borough`: Provided by the submitter and confirmed by geovalidation.

`Open Data Channel Type`: Indicates how the SR was submitted to 311. i.e. By Phone, Online, Mobile, Other or Unknown.


# Intro and Setup

(This code runs as expected)

In [None]:
# --------- Global variables ------------#
filename = '311_Service_Requests_2021.txt'

In [None]:
# Load data as a dictionary of form:
# {key: [values]}

# NOTE: Pandas (https://pandas.pydata.org) is a common package for working with 
#       table data. Here, though, we'll do all analysis in base Python

data = dict()

with open(filename, 'r') as fp:
    
    # for each line of data
    for i, line in enumerate(fp.readlines()):
        
        # strip any white space and split using tabs ('\t')
        line = line.strip().split('\t')
        
        if i == 0:
            # the first row contains header values
            header = line
            
            # make these header values the keys of the dictionary
            for key in header:
                data.setdefault(key, list())
                
        else:
            # for all lines past the first, append value to appropriate header list
            for key, value in zip(header, line):
                data[key].append(value)
                
# Now we have a dictionary where the i index of every value list represents the same row of data

In [None]:
# Look at sample of data

for key, values in data.items():
    print(key, values[:2]) # print first two "rows" of data

# Questions and Exercises

This code will ask and (attempt to) answer questions about the dataset. However, this Notebook contains **broken code that will throw errors**. Your task is to debug the code.

In [None]:
# QUESTION: How many Service Requests are in this dataset?
    
print('There are {} service requests.'.format(len(data)))

# Answer: There are 90271 service requests.

In [None]:
# QUESTION: What is the most recent day of created requests included in this dataset?

# TIPS: the dates are *not* in order
#       All Service Requests were created in Jan 2021 (eg, month and year don't change)
#       'Created Date' is a string of format: '01/09/2021 11:56:15 AM'

last_day = 0 

for timestamp in data['Created Date']:   
    date, time = timestamp.split() # splits on space by default
            
    month, day, year = date.split('/')
            
    if int(day) > last_day:
        last_day = int(day) # keep record of highest day
            
print('The last date is 1/{}/21'.format(last_day))
# A: The last date is 1/16/21

In [None]:
# QUESTION: What are the top five most common "Complaint Type"s?

complaint_counts = dict()

for complaint in data['Complaint Type']:
    # for each new type we see, add to dict with default of 0 occurances
    complaint_counts.setdefault(comp, 0)
    
    # iterate count
    complaint_counts[complaint] += 1
    
# sort this dictionary by value from highest count to lowest
top_complaints = dict((k, v) for k, v in sorted(complaint_counts.items(),
                  key=lambda item: item[1], reverse=True))
# read more about lambda functions: https://docs.python.org/3/tutorial/controlflow.html#lambda-expressions

print('Top complaint types:')
for i, (key, val) in enumerate(top_complaints.items()):
    if i < 5:
        print('   {}. {}: {} complaints'.format(i + 1, key, val))

In [None]:
# QUESTION: What percent of Service Requests are closed?
closed = data['Status'].count('Closed')

print('{} ({:.2%}) of service requests are closed'.format(len(closed), len(closed)/len(data['Status'])))
#A: 69390 (76.87%) of service requests are closed

In [None]:
# QUESTION: How often are different "Open Data Channel Type" used to submit service requests?

for channel in data['Open Data Channel Type']:
    count = data['Open Data Channel Type'].count(channel)

    print('Channel {} is used {} ({:.2%}) of the time'.format(channel, 
                                                              count, 
                                                              count/len(data['Open Data Channel Type'])))

In [None]:
# QUESTION: What is the average time (in days) a request remains open for?
#           What is the maximum time? Minumum time?

days_open = list()

for i, timestamp in enumerate(data['Closed Date']):
    
    # if closed date is not null
    if timestamp != '':
        date, time, am_pm = timestamp.split()
        
        month, close_day, year = date.split()
        
        # get associated open date
        open_date = data['Created Date']
        
        month, open_day, year = open_date.split('/')
        
        # record days between closing and opening of request
        days_open.append(int(close_day) - int(open_day))
        
print('Closed requests were open for an average of {:.2} days'.format(sum(days_open)/len(days_open)))
print('The shortest time open was {} days'.format(min(days_open)))
print('The longest time open was {} days'.format(max(days_open)))

# Answers:
# Closed requests were open for an average of 1.1 days
# The shortest time open was -8 days  <--- this is negative! (see below)
# The longest time open was 27 days

In [None]:
# QUESTION: How have requests been closed for a negative number of days??

# to start, record index where closed date is before open date
weird_indexes = dict()

for i, timestamp in enumerate(data['Closed Date']):
    if timestamp == '':
        date, time, __ = timestamp.split()
        
        month, close_day, year = date.split('/')
        
        # open date
        open_date = data['Created Date'][i]
        
        month, open_day, year = open_date.split('/')
        
        diff = int(close_day) - int(open_day)
        
        if diff < 0:
            weird_indexes[i] = diff
            
print('{} entries closed before they were opened'.format(len(weird_indexes)))

In [None]:
# see just the first 5 records
for index in weird_indexes.keys()[:5]:
    print('Created:', data['Created Date'][index])
    print('Closed:', data['Closed Date'][index])
    print('Status:', data['Status'][index])
    print()

In [None]:
# look at sample of records with 8 day difference
display = 5

for index, diff in weird_indexes:
    if diff == -8 and display >0:
        print('Created:', data['Created Date'][index])
        print('Closed:', data['Closed Date'][index])
        print('Status:', data['Status'][index])
        print()
        display -= 1

In [None]:
# TAKE 2:   What is the average time (in days) a request remains open for?
#           What is the maximum time? Minumum time?

days_open = set()

for i, status in enumerate(data['Status']):
    if status == 'Closed':
        close_timestamp = data['Closed Date'][i]
        
        if end_timestamp != '':
            close_date, time, __ = close_timestamp.split()

            month, close_day, year = close_date.split('/')

            # open date
            open_date = data['Created Date'][i]

            month, open_day, year = open_date.split('/')

            days_open.append(int(close_day) - int(open_day))
        
print('Closed requests were open for an average of {:.2} days'.format(sum(days_open)/len(days_open)))
print('The shortest time open was {} days'.format(min(days_open)))
print('The longest time open was {} days'.format(max(days_open)))

# Bonus
If you have extra time -- what questions do you have of the data?