The dataset contains information about the service requests made to 311 in NYC from 2010. In this notebook we explore the data and represent geographically some of it to get interesting insights about where the clubs and bars might be in NYC.  It is a basic exploratory exercise to get an understanding of the basic capabilities of python. I hope you enjoy it and comments are more than welcome since this is my first notebook made public in kaggle. Thanks a lot.

Exploring the dataset
---------------------

In [None]:
import warnings
warnings.filterwarnings("ignore")

import pandas as pd
import matplotlib as mpl
from matplotlib import pyplot as plt  
%matplotlib inline
plt.style.use(['fivethirtyeight'])
mpl.rcParams['lines.linewidth'] = 3

In [None]:

req_df = pd.read_csv(
    '../input/311_Service_Requests_from_2010_to_Present.csv', header=0,
    sep=',', parse_dates=['Created Date', 'Closed Date'],
    dayfirst=True, index_col='Created Date')

Let´s take a quick look at the structure of the data:

In [None]:
req_df.head(3)


Let´s get an idea of the size of the dataset:

In [None]:
req_df.shape

Let´s take a look at the different types of complaints:

In [None]:
req_df['Complaint Type'].unique()


In [None]:
req_df['Complaint Type'].value_counts().plot(kind='bar', figsize=(10,6))


We represent the number of complaints per Borough to get an overall idea of where most of the complaints are from:

In [None]:
req_df['Borough'].value_counts().plot(kind='pie', title='Boroughs')


If we represent with a scatter plot all the complaints we get the following image:

In [None]:
req_df[['Longitude', 'Latitude']].plot(kind='scatter',
    x='Longitude', y='Latitude', figsize=(10,10)).axis('equal')

If we use a different colour for each borough we obtain the following image:

In [None]:
f, ax = plt.subplots()
req_df[req_df['Borough'] == 'MANHATTAN'][['Longitude', 'Latitude']].plot(kind='scatter', x='Longitude', y='Latitude', ax=ax, figsize=(10,10)).axis('equal')
req_df[req_df['Borough'] == 'BROOKLYN'][['Longitude', 'Latitude']].plot(kind='scatter', ax=ax, x='Longitude', y='Latitude', color='r', figsize=(10,10)).axis('equal')
req_df[req_df['Borough'] == 'QUEENS'][['Longitude', 'Latitude']].plot(kind='scatter', ax=ax, x='Longitude', y='Latitude', color='g', figsize=(10,10)).axis('equal')
req_df[req_df['Borough'] == 'BRONX'][['Longitude', 'Latitude']].plot(kind='scatter', ax=ax, x='Longitude', y='Latitude', color='y', figsize=(10,10)).axis('equal')
req_df[req_df['Borough'] == 'STATEN ISLAND'][['Longitude', 'Latitude']].plot(kind='scatter', ax=ax, x='Longitude', y='Latitude', color='m', figsize=(10,10)).axis('equal')

Now, the previous image is not very representative and does not give us a lot of information since the entire city is covered with dots and it is hard to identify clusters of them. We should then use a different kind of plot, hexbin will do just fine:

In [None]:
req_df[['Longitude', 'Latitude']].plot(kind='hexbin',
    x='Longitude', y='Latitude', mincnt=1, gridsize=80, colormap='jet', figsize=(10,6)).axis('equal')

It looks like Manhattan is a hot spot.  

**Where is the party??**
------------------------

Let´s take a look at the description of the complaints that could be associated with partying:

In [None]:
req_df[req_df['Complaint Type'] == 'Noise - Commercial']['Descriptor'].value_counts()


In [None]:
req_df[req_df['Complaint Type'] == 'Noise - Street/Sidewalk']['Descriptor'].value_counts()


In [None]:
req_df[req_df['Complaint Type'] == 'Noise - House of Worship']['Descriptor'].value_counts()


"Load music/party" seems to be the main reason behind the three previous types of complaints. In the following list we find "Loud Music/Party" to be the second most important reason of complaints.

In [None]:
req_df['Descriptor'].value_counts()


Let´s represent only the complaints of "Loud Music/Party" using the coordinates:

In [None]:
req_df[req_df['Descriptor'] == 'Loud Music/Party'].plot(
    kind='hexbin', x='Longitude', y='Latitude', gridsize=80,
    colormap='jet', mincnt=1, figsize=(10,6)).axis('equal')

We can see how north Manhattan has the highest concentration of these type of complaints. 

Let´s had to the representation complaints with the ´Descriptor´ "Loud Talking" and let´s had a time frame between 00:00 and 6:00:

In [None]:
req_df[(req_df.index.hour >= 0) & (req_df.index.hour < 6)
       & ((req_df['Descriptor'] == 'Loud Music/Party')|(req_df['Descriptor'] == 'Loud Talking'))
      ].plot(kind='hexbin', x='Longitude', y='Latitude',
                                       gridsize=80, colormap='jet', mincnt=1, figsize=(10,6)).axis('equal')

So, I am not from NYC and don´t know the city very well but It seems like Lower Manhattan and specially Upper Manhattan and Harlem are the places to go out. 