# Data Reading

First class exposure on how to read and setup data in Python. Created markdowns, added codes and improve lines to demonstrate familiarity after the Data Mining course has finished.

## Load all libraries

In [1]:
import pandas as pd
import numpy as np
import csv
import json

from io import StringIO #The StringIO module an in-memory file-like object. 

# Useful code to remove unnecessary warnings
import warnings
warnings.filterwarnings("ignore")

## Read CSV

Keynote here is that read_csv does not only read csv files but also defined texts.

### Item 1 
Read the created in-memory file `one`and create a DataFrame out of it.

In [2]:
one = StringIO('''p,    s,   m
1, 2, 3
4, 5, 6
7, 8, 9
''')

**Your answer below:**

In [3]:
df = pd.read_csv(one)
df


Unnamed: 0,p,s,m
0,1,2,3
1,4,5,6
2,7,8,9


We assume here that p, s and m are column names and row index were not provided.

### Item 2
Read the created in-memory file `two`and create a DataFrame wherein all cells without values in the CSV are assigned as `NaN`.

In [4]:
two = StringIO('''1, 2, 3
4, 5
7
''')

**Your answer below:**

In [5]:
df = pd.read_csv(two, header=None, index_col=False)
df

Unnamed: 0,0,1,2
0,1,2.0,3.0
1,4,5.0,
2,7,,


## Read Excel

### Item 3
Reads `NCR.xlsx` and returns a DataFrame with two columns: 'City, Municipality, and Barangay' and 'Total Population'

**Your answer below:**

In [6]:
data = pd.read_excel('NCR.xlsx',skiprows=5,usecols=[1,2])

# Convert to dataframe
df = pd.DataFrame(data)

# Define your column name
df.columns = ['Total Population by City, Municipality and Barangay', 'Total Population']

# Remove the nulls
df_final = df.loc[pd.notnull(df.iloc[:,1]),:]

df_final

Unnamed: 0,"Total Population by City, Municipality and Barangay",Total Population
0,NATIONAL CAPITAL REGION,12877253.0
2,CITY OF MANILA,1780148.0
4,TONDO,631363.0
5,Barangay 1,1976.0
6,Barangay 2,1662.0
...,...,...
1766,North Signal Village,32112.0
1767,Pinagsama,57343.0
1768,San Miguel,8590.0
1769,South Daang Hari,19166.0


## Read Json File

### Item 4
Read a json file with path `https://raw.githubusercontent.com/domoritz/maps/master/data/iris.json` then creates a Pandas DataFrame.

**Your answer below:**

In [7]:
df = pd.read_json('https://raw.githubusercontent.com/domoritz/maps/master/data/iris.json')
df


Unnamed: 0,sepalLength,sepalWidth,petalLength,petalWidth,species
0,5.1,3.5,1.4,0.2,setosa
1,4.9,3.0,1.4,0.2,setosa
2,4.7,3.2,1.3,0.2,setosa
3,4.6,3.1,1.5,0.2,setosa
4,5.0,3.6,1.4,0.2,setosa
...,...,...,...,...,...
145,6.7,3.0,5.2,2.3,virginica
146,6.3,2.5,5.0,1.9,virginica
147,6.5,3.0,5.2,2.0,virginica
148,6.2,3.4,5.4,2.3,virginica


## Read Zipped Format File

### Item 5
Read `listings.csv.gz` then creates the Excel file airbnb.xlsx

**Your answer below:**

In [8]:
df = pd.read_csv('listings.csv.gz', compression='gzip')
df

Unnamed: 0,id,listing_url,scrape_id,last_scraped,name,summary,space,description,picture_url,host_id,...,review_scores_cleanliness,review_scores_checkin,review_scores_communication,review_scores_location,review_scores_value,requires_license,license,jurisdiction_names,calculated_host_listings_count,reviews_per_month
0,600795,https://www.airbnb.com/rooms/600795,20150406154443,2015-04-06,London Zone1 LOFT 1Bed/1Lounge flat,"Really Gorgeous LOFT flat for 1-5 people, near...","GREAT VALUE, YET VERY COMFORTABLE FLAT NEAR ZO...","Really Gorgeous LOFT flat for 1-5 people, near...",https://a0.muscache.com/ic/pictures/8253726/36...,485861,...,9.0,8.0,9.0,8.0,8.0,f,,,8,2.1
1,4530027,https://www.airbnb.com/rooms/4530027,20150406154443,2015-04-07,Double room 20mins to Oxford Street,Double room available in a beautifully develop...,,Double room available in a beautifully develop...,https://a1.muscache.com/ic/pictures/56886040/2...,23486604,...,,,,,,f,,,1,
2,896177,https://www.airbnb.com/rooms/896177,20150406154443,2015-04-06,Cosy room for 1 or 2 in Kennington,,Private room with large windows onto and direc...,Private room with large windows onto and direc...,https://a1.muscache.com/ic/pictures/13227237/3...,4777568,...,10.0,10.0,10.0,9.0,10.0,f,,,1,0.4
3,3770567,https://www.airbnb.com/rooms/3770567,20150406154443,2015-04-06,Double Room with large garden,A ground floor room in the heart of Islington....,,A ground floor room in the heart of Islington....,https://a0.muscache.com/ic/pictures/47447139/6...,11255553,...,,,,,,f,,,1,
4,4292560,https://www.airbnb.com/rooms/4292560,20150406154443,2015-04-07,Superb 3BR House Notting Hill VP,Fabulous 3 bedroom Mews house in Notting Hill ...,This house contains a cosy living room on the ...,Fabulous 3 bedroom Mews house in Notting Hill ...,https://a1.muscache.com/ic/pictures/67441108/4...,1432477,...,9.0,9.0,9.0,9.0,8.0,f,,,100,1.8
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
18431,1313539,https://www.airbnb.com/rooms/1313539,20150406154443,2015-04-07,Liverpool Street Double Room *11,Hello dear visitors and welcome to our Monopol...,"Hi, WE HAVE BRIGHT AND SPACIOUS FULLY FURNISHE...",Hello dear visitors and welcome to our Monopol...,https://a2.muscache.com/ic/pictures/63084242/c...,7106582,...,9.0,9.0,9.0,9.0,9.0,f,,,10,4.9
18432,680656,https://www.airbnb.com/rooms/680656,20150406154443,2015-04-07,"Great Triple Studio, in Hammermith",,Just a 2-minute walk from Hammersmith Tube Sta...,Just a 2-minute walk from Hammersmith Tube Sta...,https://a2.muscache.com/ic/pictures/12438816/3...,216660,...,10.0,10.0,10.0,10.0,10.0,f,,,9,0.4
18433,5373885,https://www.airbnb.com/rooms/5373885,20150406154443,2015-04-07,Double En-suite,Well furnished double bedroom with attache...,,Well furnished double bedroom with attache...,https://a2.muscache.com/ic/pictures/67458346/1...,23211855,...,,,,,,f,,,3,
18434,586400,https://www.airbnb.com/rooms/586400,20150406154443,2015-04-07,"Modern, stylish apartment in Camden",,Our Camden apartment is within close proximity...,Our Camden apartment is within close proximity...,https://a1.muscache.com/ic/pictures/7303558/35...,2890674,...,,,,,,f,,,1,


In [9]:
# Save to excel
df.to_excel("airbnb.xlsx")