Basic and useful Data Analysis using Python of NYC Parks 
Part 1 : Reading and writing CSV file on python

**Goal:** In this notebook, we will (load) and write (save) data from NYC Open Data. Specifically, we will focus on reading our data into a pandas dataframe.

**Main Library:** [pandas](https://pandas.pydata.org/) is a fast, powerful, flexible and easy to use open source data analysis and manipulation tool, built on top of the Python programming language.

Data used for this project its located here (CSV)

https://data.cityofnewyork.us/api/views/n8q6-i44s/rows.csv?accessType=DOWNLOAD&bom=true&format=true
note: 'we are using a somewhat balanced and clean pre-processed CSV so we dont have to inspect, clean and wrangle mannually this file'

Next to build this project we import the necessaries libraries:


In [6]:
# importing libraries
import pandas as pd 
import numpy as np 
import matplotlib
import matplotlib.pyplot as plt
import matplotlib.ticker as ticker
from matplotlib.ticker import FuncFormatter
import seaborn as sns
from scipy import stats
from numpy import median
sns.color_palette("tab10")

Next, we proceed to read our CSV File and preview our CSV head data and general info:


In [16]:
# read data as a dataframe
df = pd.read_csv(r'C:\Users\M\Downloads\NYC_Parks_Structures.csv')

# previewing first five rows in data
df.head()




Unnamed: 0,Alteration_Year,BBL,BIN,borough,Comfort Station,CNSTRCT_YR,DESCRIPTION,DOITT_ID,DOITT_Source,Demolition_Year,...,HEIGHTROOF,LOCATION,MaintBy,MaintBySpec,OMPPROPID,Parks_District,SYSTEM,multipolygon,Recreation_Center,FeatureStatus
2771,,4059170000.0,4539831.0,QN,False,,Fort Totten Park-Building,827172.0,current,,...,42.0,,,,Q458,Q-07,Q458-BLG0082,MULTIPOLYGON (((-73.7751400326344 40.794544083...,False,Inactive
2772,,4059170000.0,,QN,False,,Fort Totten Park-Building,,,,...,,,,,Q458-ZN01,Q-07,Q458-BLG0081,MULTIPOLYGON (((-73.77646199134854 40.79456132...,False,Inactive
2773,,4059170000.0,4000000.0,QN,False,,Fort Totten Park-Building,,,,...,,,,,Q458,Q-07,Q458-BLG0079,MULTIPOLYGON (((-73.77955618031557 40.79495439...,False,Inactive
2774,,4059170000.0,4539845.0,QN,False,,Fort Totten Park-Building 342-Swimming Pool Fi...,486282.0,current,,...,9.0,342 Story Avenue,,,Q458,Q-07,Q458-BLG0078,MULTIPOLYGON (((-73.77436014056161 40.79226430...,False,Active
2775,,4059170000.0,4453939.0,QN,False,,Fort Totten Park-Building,246576.0,current,,...,34.0,,,,Q458,Q-07,Q458-BLG0077,MULTIPOLYGON (((-73.77338091578626 40.78984247...,False,Inactive


In [18]:
# previewing last five rows in data
df.tail()

Unnamed: 0,Alteration_Year,BBL,BIN,borough,Comfort Station,CNSTRCT_YR,DESCRIPTION,DOITT_ID,DOITT_Source,Demolition_Year,...,HEIGHTROOF,LOCATION,MaintBy,MaintBySpec,OMPPROPID,Parks_District,SYSTEM,multipolygon,Recreation_Center,FeatureStatus
2771,,4059170000.0,4539831.0,QN,False,,Fort Totten Park-Building,827172.0,current,,...,42.0,,,,Q458,Q-07,Q458-BLG0082,MULTIPOLYGON (((-73.7751400326344 40.794544083...,False,Inactive
2772,,4059170000.0,,QN,False,,Fort Totten Park-Building,,,,...,,,,,Q458-ZN01,Q-07,Q458-BLG0081,MULTIPOLYGON (((-73.77646199134854 40.79456132...,False,Inactive
2773,,4059170000.0,4000000.0,QN,False,,Fort Totten Park-Building,,,,...,,,,,Q458,Q-07,Q458-BLG0079,MULTIPOLYGON (((-73.77955618031557 40.79495439...,False,Inactive
2774,,4059170000.0,4539845.0,QN,False,,Fort Totten Park-Building 342-Swimming Pool Fi...,486282.0,current,,...,9.0,342 Story Avenue,,,Q458,Q-07,Q458-BLG0078,MULTIPOLYGON (((-73.77436014056161 40.79226430...,False,Active
2775,,4059170000.0,4453939.0,QN,False,,Fort Totten Park-Building,246576.0,current,,...,34.0,,,,Q458,Q-07,Q458-BLG0077,MULTIPOLYGON (((-73.77338091578626 40.78984247...,False,Inactive


In [19]:
#Printing the shape and Dimensions of our dataframe
rows, columns = df.shape
print('This dataset has {:,} rows and {:,} columns.'.format(rows, columns))

This dataset has 2,776 rows and 22 columns.


In [21]:
# the object's type
type(df)

pandas.core.frame.DataFrame

In [22]:
# printing the columns of our dataframe in new line
for col in df.columns:
    print(col)

Alteration_Year
BBL
BIN
borough
Comfort Station
CNSTRCT_YR
DESCRIPTION
DOITT_ID
DOITT_Source
Demolition_Year
GISPROPNUM
GROUNDELEV
HEIGHTROOF
LOCATION
MaintBy
MaintBySpec
OMPPROPID
Parks_District
SYSTEM
multipolygon
Recreation_Center
FeatureStatus


In [23]:
# printing the data types of our columns
df.dtypes

Alteration_Year      float64
BBL                  float64
BIN                  float64
borough               object
Comfort Station       object
CNSTRCT_YR           float64
DESCRIPTION           object
DOITT_ID             float64
DOITT_Source          object
Demolition_Year      float64
GISPROPNUM            object
GROUNDELEV           float64
HEIGHTROOF           float64
LOCATION              object
MaintBy              float64
MaintBySpec          float64
OMPPROPID             object
Parks_District        object
SYSTEM                object
multipolygon          object
Recreation_Center     object
FeatureStatus         object
dtype: object

In [24]:
# printing index type
df.index

RangeIndex(start=0, stop=2776, step=1)

In [25]:
# printing the column names, non-null counts,
# and datatypes of our columns
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 2776 entries, 0 to 2775
Data columns (total 22 columns):
 #   Column             Non-Null Count  Dtype  
---  ------             --------------  -----  
 0   Alteration_Year    42 non-null     float64
 1   BBL                2675 non-null   float64
 2   BIN                2447 non-null   float64
 3   borough            2764 non-null   object 
 4   Comfort Station    2764 non-null   object 
 5   CNSTRCT_YR         1805 non-null   float64
 6   DESCRIPTION        2579 non-null   object 
 7   DOITT_ID           2281 non-null   float64
 8   DOITT_Source       2240 non-null   object 
 9   Demolition_Year    0 non-null      float64
 10  GISPROPNUM         2760 non-null   object 
 11  GROUNDELEV         2234 non-null   float64
 12  HEIGHTROOF         2164 non-null   float64
 13  LOCATION           1647 non-null   object 
 14  MaintBy            0 non-null      float64
 15  MaintBySpec        0 non-null      float64
 16  OMPPROPID          2720 