# Reshaping data: Portland housing developments

In this notebook, we're going to work with some data on Portland (Oregon) housing developments since 2014. Right now, the data are scattered across a jillion spreadsheets. Our goal is to parse them all into one clean CSV. (Thanks to [Kelly Kenoyer of the Portland Mercury](https://twitter.com/Kelly_Kenoyer) for donating this data.)

The spreadsheets, a mixture of `xls` and `xlsx` files, live in `../data/portland/`. A few things to note:
- Some of the spreadsheets have extra columns
- Some of the spreadsheets have other worksheets in addition to the data worksheet (pivot tables, mostly) -- but these are not always in the same position
- Some of the spreadsheets have columns of mostly blank data that the city once used to manually aggregate data by category -- we don't want these columns
- Some of the spreadsheets have blank rows

Our strategy:
- Get a list of Excel files in that directory using the [`glob`](https://docs.python.org/3/library/glob.html) module
- Create an empty pandas data frame
- Loop over the list of spreadsheet files and ...
    - Read in the file to a data frame
    - Find the correct worksheet
    - Drop empty columns and rows
    - Append to the main data frame
    
First, we'll import `glob` and pandas.

In [1]:
import glob

import pandas as pd

Next, we'll use `glob` to get a list of the files we're going to loop over. We'll use the asterisk `*`, which means "match everything."

In [2]:
xl_files = glob.glob('../data/portland/*')

In [3]:
print(xl_files)

['../data/portland/02_2017 new res units.xlsx', '../data/portland/04_2016 New Res Units.xls', '../data/portland/2018 01 New Residential Units - checked.xls', '../data/portland/03_2016 New Res Units.xls', '../data/portland/08_2014 New Res Units.xls', '../data/portland/2017 07 New Residential Units - checked.xls', '../data/portland/2017 08 New Residential Units - checked.xls', '../data/portland/2018 02 New Residential Units - checked.xls', '../data/portland/04_2015 New Res Units.xlsx', '../data/portland/2017 11 New Residential Units - checked.xls', '../data/portland/10_2016 new res units.xlsx', '../data/portland/08_2015 New Res Units.xlsx', '../data/portland/07_2015 New Res Units.xlsx', '../data/portland/05_2017 new res units.xls', '../data/portland/2018 05 New Residential Units - updated.xls', '../data/portland/01_2016 New Res Units.xls', '../data/portland/09_2014 New Res Units.xls', '../data/portland/12_2014 New Res Units.xlsx', '../data/portland/02_2016 New Res Units.xls', '../data/po

Now we'll create an empty data frame. This will be the container we stuff the data into as we loop over the files.

In [4]:
housing = pd.DataFrame()

Let's take a look at what we're dealing with. We're going to loop over the spreadsheet, and for each one, we're going to look at:
- The names of the worksheets in that spreadsheet
- The columns in each worksheet

This will help us decide, later, which worksheets we need to target.

We're going to take advantage of the fact, [according to the `read_excel()` documentation](https://pandas.pydata.org/pandas-docs/stable/generated/pandas.read_excel.html), that you can pass `None` as the `sheet_names` argument and pandas will read in _all_ of the sheets as a big dictionary -- the keys are the names of the worksheets, the values are the associated data frames.

Later, our logic will go like this:
- Read in every worksheet as a data frame
- Target the worksheet whose name matches the pattern for the data we need

👉 For a refresher on _for loops_ and dictionaries, [check out this notebook](../reference/Python%20data%20types%20and%20basic%20syntax.ipynb#for-loops).

In [5]:
# loop over the excel file paths
for f in xl_files:
    
    # load the file into a data frame
    # specifying `None` as the sheet name
    df = pd.read_excel(f, sheet_name=None)
    
    # print the name of the file
    print(f)
    
    # print the worksheet names
    print(df.keys())
    
    # print a divider to make scanning easier
    print('='*60)
    
    # and an empty line
    print('')

../data/portland/02_2017 new res units.xlsx
odict_keys(['Mercantile units', 'Sheet1', 'nrs'])

../data/portland/04_2016 New Res Units.xls
odict_keys(['Sheet2', 'PIVOT', '04_2016 New Res Units'])

../data/portland/2018 01 New Residential Units - checked.xls
odict_keys(['Summary', '2018 01 New Residential Units'])

../data/portland/03_2016 New Res Units.xls
odict_keys(['pivot', '03_2016 New Res Units'])

../data/portland/08_2014 New Res Units.xls
odict_keys(['08_2014 New Res Units'])

../data/portland/2017 07 New Residential Units - checked.xls
odict_keys(['Sheet1', '2017 07 New Residential Units'])

../data/portland/2017 08 New Residential Units - checked.xls
odict_keys(['Sheet1', '2017 08 New Residential Units'])

../data/portland/2018 02 New Residential Units - checked.xls
odict_keys(['Sheet1', '2018 02 New Residential Units'])

../data/portland/04_2015 New Res Units.xlsx
odict_keys(['Sheet1', '04_2015 New Res Units'])

../data/portland/2017 11 New Residential Units - checked.xls
odic

OK. So it looks like our target sheets are called a few different things: `nrs`, `04_2016 New Res Units'`, `'2018 04 New Residential Units'`, etc.

Can we come up with a list of patterns to match all of them? I think we can.

In [6]:
# the items in this list are lowercased,
# because we're gonna match on .lower()'d versions of the sheet names
target_sheet_name_fragments = ['new res', 'nrs', 'lus stats']

So now, we need to write some logic that says: Pick the worksheet that has one of our `target_sheet_name_fragments` in the name. A nested pair of _for loops_ will do the trick for us.

In [32]:
# loop over the excel file paths
for f in xl_files:
    
    # load the file into a data frame
    # specifying `None` as the sheet name
    df = pd.read_excel(f, sheet_name=None)
        
    # start off with no match
    match = None
    
    # loop over the worksheet names
    for ws_name in df.keys():
        
        # loop over the word fragments
        for fragment in target_sheet_name_fragments:
            
            # if this fragment exists in the lowercased worksheet name
            if fragment in ws_name.lower():
                
                # we've got a winner
                match = ws_name
    
    # if, when we get to the end of this, `match` is still None
    if not match:
        # print something to let us know about it
        print(f'NO MATCH FOUND FOR {f}')
        
        # and the names of the sheets
        print(df.keys())
        
        # and break out of the loop
        break
    
    # otherwise, grab a handle to the worksheet we want
    working_df = df[match]
    
    # print a status message to let us know what's up
    print(f'parsing "{match}" worksheet from "{f}"')

parsing "nrs" worksheet from "../data/portland/02_2017 new res units.xlsx"
parsing "04_2016 New Res Units" worksheet from "../data/portland/04_2016 New Res Units.xls"
parsing "2018 01 New Residential Units" worksheet from "../data/portland/2018 01 New Residential Units - checked.xls"
parsing "03_2016 New Res Units" worksheet from "../data/portland/03_2016 New Res Units.xls"
parsing "08_2014 New Res Units" worksheet from "../data/portland/08_2014 New Res Units.xls"
parsing "2017 07 New Residential Units" worksheet from "../data/portland/2017 07 New Residential Units - checked.xls"
parsing "2017 08 New Residential Units" worksheet from "../data/portland/2017 08 New Residential Units - checked.xls"
parsing "2018 02 New Residential Units" worksheet from "../data/portland/2018 02 New Residential Units - checked.xls"
parsing "04_2015 New Res Units" worksheet from "../data/portland/04_2015 New Res Units.xlsx"
parsing "2017 11 New Residential Units" worksheet from "../data/portland/2017 11 New

Scanning through that list, I feel comfortable that we're grabbing the correct data. Let's take a look at the columns in each worksheet we'll be parsing.

In [34]:
# loop over the excel file paths
for f in xl_files:
    
    # load the file into a data frame
    # specifying `None` as the sheet name
    df = pd.read_excel(f, sheet_name=None)
        
    # start off with no match
    match = None
    
    # loop over the worksheet names
    for ws_name in df.keys():
        
        # loop over the word fragments
        for fragment in target_sheet_name_fragments:
            
            # if this fragment exists in the lowercased worksheet name
            if fragment in ws_name.lower():
                
                # we've got a winner
                match = ws_name
    
    # if, when we get to the end of this, `match` is still None
    if not match:
        # print something to let us know about it
        print(f'NO MATCH FOUND FOR {f}')
        
        # and the names of the sheets
        print(df.keys())
        
        # and break out of the loop
        break
    
    # otherwise, grab a handle to the worksheet we want
    working_df = df[match]
    
    # print a status message to let us know what's up
    print(f'parsing "{match}" worksheet from "{f}"')
    
    # print a sorted list of column names
    print(sorted(working_df.columns))
    
    # print a divider to make scanning our results easier
    print('='*60)
    
    # print an empty line
    print()

parsing "nrs" worksheet from "../data/portland/02_2017 new res units.xlsx"
['applicant_name', 'architect_names', 'const', 'folder_desc', 'foldernumber', 'folderrsn', 'indate', 'issuedate', 'legaldesc', 'logged', 'new_units', 'occ', 'owner_names', 'project_name', 'prop_address', 'propertyroll', 'propgisid1', 'proplot', 'propplan', 'proptownship', 'rev', 'sqft', 'status', 'sub', 'valuation', 'workdesc']

parsing "04_2016 New Res Units" worksheet from "../data/portland/04_2016 New Res Units.xls"
['applicant_name', 'architect_names', 'const', 'folder_desc', 'foldernumber', 'folderrsn', 'indate', 'issuedate', 'legaldesc', 'logged', 'new_units', 'occ', 'owner_names', 'project_name', 'prop_address', 'propertyroll', 'propgisid1', 'proplot', 'propplan', 'proptownship', 'sqft', 'status', 'sub', 'valuation', 'workdesc']

parsing "2018 01 New Residential Units" worksheet from "../data/portland/2018 01 New Residential Units - checked.xls"
['applicant_name', 'architect_names', 'const', 'folder', 'fo

parsing "03_2015 New Res Units" worksheet from "../data/portland/03_2015 New Res Units.xlsx"
['applicant_name', 'architect_names', 'const', 'folder_desc', 'foldernumber', 'folderrsn', 'indate', 'issuedate', 'legaldesc', 'logged', 'new_units', 'occ', 'owner_names', 'project_name', 'prop_address', 'propertyroll', 'propgisid1', 'proplot', 'propplan', 'proptownship', 'rev', 'sqft', 'status', 'sub', 'valuation', 'workdesc']

parsing "11_2015 New Res Units" worksheet from "../data/portland/11_2015 New Res Units 2.xls"
['applicant_name', 'architect_names', 'const', 'folder_desc', 'foldernumber', 'folderrsn', 'indate', 'issuedate', 'legaldesc', 'logged', 'new_units', 'occ', 'owner_names', 'project_name', 'prop_address', 'propertyroll', 'propgisid1', 'proplot', 'propplan', 'proptownship', 'sqft', 'status', 'sub', 'valuation', 'workdesc']

parsing "07_2016 new res units" worksheet from "../data/portland/07_2016 new res units.xlsx"
['applicant_name', 'architect_names', 'const', 'folder_desc', 'fo

parsing "02_2015 New Res Units" worksheet from "../data/portland/02_2015 New Res Units.xlsx"
['applicant_name', 'architect_names', 'const', 'folder_desc', 'foldernumber', 'folderrsn', 'indate', 'issuedate', 'legaldesc', 'logged', 'new_units', 'occ', 'owner_names', 'project_name', 'prop_address', 'propertyroll', 'propgisid1', 'proplot', 'propplan', 'proptownship', 'rev', 'sqft', 'status', 'sub', 'valuation', 'workdesc']

parsing "04_2017 new res units" worksheet from "../data/portland/04_2017 new res units.xlsx"
['applicant_name', 'architect_names', 'const', 'folder_desc', 'foldernumber', 'folderrsn', 'indate', 'issuedate', 'legaldesc', 'logged', 'new_units', 'occ', 'owner_names', 'project_name', 'prop_address', 'propertyroll', 'propgisid1', 'proplot', 'propplan', 'proptownship', 'rev', 'sqft', 'status', 'sub', 'valuation', 'workdesc']

parsing "06_2015 New Res Units" worksheet from "../data/portland/06_2015 New Res Units.xlsx"
['applicant_name', 'architect_names', 'const', 'folder_desc

I notice that some columns are, e.g. `Unnamed: 4`. That means there's no column header. Let's take a look at one of those:

In [35]:
test = pd.read_excel('../data/portland/08_2014 New Res Units.xls', sheet_name='08_2014 New Res Units')

In [39]:
test.head(20)

Unnamed: 0,foldernumber,prop_address,sub,new_units,Unnamed: 4,valuation,Unnamed: 6,folder_desc,occ,workdesc,...,legaldesc,proptownship,propplan,folderrsn,sqft,logged,project_name,applicant_name,architect_names,owner_names
0,14-164434-000-00-RS,4215 NE 20TH AVE - Unit B,Accessory Dwelling Unit,1.0,,35734.0,,CONVERT EXISTING DETACHED GARAGE TO ACCESSORY ...,R-3_Residential One and Two Family,Alteration,...,,,R860713900,3469651.0,324.0,OK,4215 NE 20TH AVE Unit B,CLEARWATER CONSTRUCTION *RYAN *,NONE,LEANNE WHITE
1,14-170323-000-00-RS,6453 SE RAYMOND ST,Accessory Dwelling Unit,1.0,,89517.0,,NEW DETACHED TWO STORY ADU IN REAR YARD/GARAGE...,R-3_Residential One and Two Family,New Construction,...,,,R036900810,3475646.0,942.0,OK,6453 SE RAYMOND ST,IVON STREET STUDIO *MICHELLE JERESEK*,NONE,"RYAN J KUNZER, WENDY BOSO"
2,14-121509-000-00-RS,1610 SE ASH ST,Accessory Dwelling Unit,1.0,,55240.0,,NEW DETACHED ADU WITH 128 SQ FT MEZZANINE FOR ...,R-3_Residential One and Two Family,New Construction,...,,,R941350600,3426008.0,748.0,OK,204 SE 16TH AVE,"DMS ARCHITECTS, INC *DAVE SPITZER*",DAVE SPITZER,"JAMES K FORRISTAL, ADA B KARDOS"
3,14-175678-000-00-RS,4626 NE 31ST AVE - Unit B,Accessory Dwelling Unit,1.0,,48969.0,,NEW ACCESSORY DWELLING UNIT / 1 STORY / NO GA...,R-3_Residential One and Two Family,New Construction,...,,,R242302210,3481070.0,444.0,OK,4626 NE 31ST AVE Unit B,ASHLEY E BARTON,NONE,ASHLEY E BARTON
4,14-180679-000-00-RS,6120 NE 22ND AVE - Unit B,Accessory Dwelling Unit,1.0,,35624.0,,NEW ACCESSORY DWELLING UNIT/1-STORY/FLAT LOT/C...,R-3_Residential One and Two Family,New Construction,...,,,R421302240,3486127.0,323.0,OK,6120 NE 22ND AVE,"POLYPHON ARCHITECTURE & DESIGN, LLC *SCHUYLER ...",NONE,"MARC B PETERS, FAUSTINA M ASH"
5,14-199311-000-00-RS,3019 SE 21ST AVE - Unit B,Accessory Dwelling Unit,1.0,,3000.0,,CONVERT EXISTING SHARED GARAGE TO ACCESSORY DW...,R-3_Residential One and Two Family,Alteration,...,,,R834303130,3504956.0,0.0,OK,3019 SE 21ST AVE,COLIN SHARP,NONE,"COLIN SHARP, KRISTEN SHARP"
6,12-203807-000-00-RS,5324 NE 28TH AVE - Unit B,Accessory Dwelling Unit,1.0,,78979.0,,NEW 760 SF ACCESSORY DWELLING UNIT/2 STORY/FLA...,R-3_Residential One and Two Family,New Construction,...,,,R413901710,3242638.0,760.0,OK,5324 NE 28TH AVE Unit B,MENTRUM ARCHITECTURE *BAYARD MENTRUM*,NONE,JENNIFER FARMER
7,14-160745-000-00-RS,2116 NE ROSA PARKS WAY - Unit B,Accessory Dwelling Unit,1.0,,55145.0,,NEW DETACHED ADU ABOVE TWO CAR GARAGE/FLAT LOT...,R-3_Residential One and Two Family,New Construction,...,,,R173101200,3465918.0,500.0,OK,2116 NE ROSA PARKS WAY Unit B,URBAN VISIONS *KEVIN PARTAIN*,NONE,"DOSHIE L CLARK, CAROLYN F JOHNSON"
8,14-192361-000-00-RS,5716 NE 32ND AVE - Unit B,Accessory Dwelling Unit,1.0,9.0,18013.0,420221.0,ADDITION TO EXISTING GARAGE TO CREATE NEW ACCE...,R-3_Residential One and Two Family,Addition,...,,,R421318050,3497935.0,350.0,OK,5716 NE 32ND AVE Unit B,ERIN R CONNOR,NONE,ERIN R CONNOR
9,,,,,,,,,,,...,,,,,,,,,,


Looks like they're using those columns to total up the valuations for groups of housing types. I'm noticing, too, that there are some blank rows -- probably used as dividers between groups -- so we'll want to drop those as well.

We'll keep that in mind as we roll through these sheets.

Here's the pandas documentation on the methods we'll be using here:
- [`append()`](https://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.append.html)
- [`drop()`](https://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.drop.html)
- [`dropna()`](https://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.dropna.html)

In [44]:
# loop over the excel file paths
for f in xl_files:
    
    # load the file into a data frame
    # specifying `None` as the sheet name
    df = pd.read_excel(f, sheet_name=None)

    # start off with no match
    match = None
    
    # loop over the worksheet names
    for ws_name in df.keys():
        
        # loop over the word fragments
        for fragment in target_sheet_name_fragments:
            
            # if this fragment exists in the lowercased worksheet name
            if fragment in ws_name.lower():
                
                # we've got a winner
                match = ws_name

    # if, when we get to the end of this, `match` is still None
    if not match:
        # print something to let us know about it
        print(f'NO MATCH FOUND FOR {f}')
        
        # and the names of the sheets
        print(df.keys())
        
        # and break out of the loop
        break
                
    # otherwise, grab a handle to the worksheet we want
    working_df = df[match]
    
    # print a status message to let us know what's up
    print(f'parsing "{match}" worksheet from "{f}"')
    
    # get a list of columns we want to drop
    columns_to_drop = [x for x in working_df.columns if 'Unnamed' in x]
    
    # drop those bad boys
    working_df = working_df.drop(columns_to_drop, axis=1)

    # drop empty rows in place, but only if _all_ of the values are nulls
    working_df.dropna(inplace=True, how='all')
    
    # append to our `housing` data frame
    housing = housing.append(working_df,
                             ignore_index=True,
                             sort=True)

parsing "nrs" worksheet from "../data/portland/02_2017 new res units.xlsx"
parsing "04_2016 New Res Units" worksheet from "../data/portland/04_2016 New Res Units.xls"
parsing "2018 01 New Residential Units" worksheet from "../data/portland/2018 01 New Residential Units - checked.xls"
parsing "03_2016 New Res Units" worksheet from "../data/portland/03_2016 New Res Units.xls"
parsing "08_2014 New Res Units" worksheet from "../data/portland/08_2014 New Res Units.xls"
parsing "2017 07 New Residential Units" worksheet from "../data/portland/2017 07 New Residential Units - checked.xls"
parsing "2017 08 New Residential Units" worksheet from "../data/portland/2017 08 New Residential Units - checked.xls"
parsing "2018 02 New Residential Units" worksheet from "../data/portland/2018 02 New Residential Units - checked.xls"
parsing "04_2015 New Res Units" worksheet from "../data/portland/04_2015 New Res Units.xlsx"
parsing "2017 11 New Residential Units" worksheet from "../data/portland/2017 11 New

In [47]:
housing.head()

Unnamed: 0,applicant_name,architect_names,const,folder,folder_desc,foldernumber,folderrsn,indate,issuedate,legaldesc,...,propgisid1,proplot,propplan,proptownship,rev,sqft,status,sub,valuation,workdesc
0,URBAN VISIONS *KEVIN PARTAIN*,NONE,V-B,,NEW SINGLE FAMILY RESIDENCE/TWO STORY WITH BAS...,14-134373-000-00-RS,3439167.0,2014-04-02 00:00:00,2014-08-12 12:49:38,5009-5011 NE 26TH AVE *,...,1N1E24BB 23300,R186548,R413904750,5393.0,,3054.0,Under Inspection,Single Family Dwelling,299734.0,New Construction
1,UDG Umatilla LLC *David Mullens*,NONE,V-A,,NEW 4 STORY WOOD FRAME APARTMENT BUILDING WITH...,15-165921-000-00-CO,3628809.0,2015-05-06 00:00:00,2016-01-19 15:40:19,8235-8237 SE 17TH AVE *,...,1S1E23DC 7300,R267583,R752712850,1338.0,,58576.0,Under Inspection,Apartments/Condos (3 or more units),9000000.0,New Construction
2,URBAN VISIONS *KEVIN PARTAIN*,NONE,V-B,,NEW SINGLE FAMILY RESIDENCE/TWO STORY WITH BAS...,14-134384-000-00-RS,3439178.0,2014-04-02 00:00:00,2014-09-03 10:56:44,5009-5011 NE 26TH AVE *,...,1N1E24BB 23300,R186548,R413904750,5393.0,,3054.0,Under Inspection,Single Family Dwelling,299734.0,New Construction
3,CONCEPT DESIGN & ASSOCIATES *KYM NGUYEN*,NONE,V-B,,NEW SINGLE FAMILY RESIDENCE / 3-STORY (INCLUDE...,14-186131-000-00-RS,3491628.0,2014-07-18 00:00:00,2017-02-09 10:58:15,,...,1S1E17CA 9600,R146946,R208002770,,,4941.0,Under Inspection,Single Family Dwelling,344731.0,New Construction
4,JEREMY M ECKERT,NONE,V-B,,NEW DETACHED ACCESSORY DWELLING UNIT/2-STORY/N...,16-198319-000-00-RS,3848149.0,2016-07-01 00:00:00,2017-02-03 10:22:04,,...,1S1E01CA 13500,R269447,R763200190,,,800.0,Issued,Accessory Dwelling Unit,90120.0,New Construction


In [48]:
len(housing)

5434

In [49]:
housing.dtypes

applicant_name      object
architect_names     object
const               object
folder              object
folder_desc         object
foldernumber        object
folderrsn          float64
indate              object
issuedate           object
legaldesc           object
logged              object
new_units          float64
occ                 object
owner_names         object
project_name        object
prop_address        object
propertyroll        object
propgisid1          object
proplot             object
propplan            object
proptownship       float64
rev                 object
sqft               float64
status              object
sub                 object
valuation          float64
workdesc            object
dtype: object

One last thing I'd do, before writing out to file, is parse the date columns as dates:

In [50]:
# convert "indate" column to datetime
housing.indate = pd.to_datetime(housing.indate)

# convert "indate" column to datetime
housing.issuedate = pd.to_datetime(housing.issuedate)

In [52]:
housing.head()

Unnamed: 0,applicant_name,architect_names,const,folder,folder_desc,foldernumber,folderrsn,indate,issuedate,legaldesc,...,propgisid1,proplot,propplan,proptownship,rev,sqft,status,sub,valuation,workdesc
0,URBAN VISIONS *KEVIN PARTAIN*,NONE,V-B,,NEW SINGLE FAMILY RESIDENCE/TWO STORY WITH BAS...,14-134373-000-00-RS,3439167.0,2014-04-02,2014-08-12 12:49:38,5009-5011 NE 26TH AVE *,...,1N1E24BB 23300,R186548,R413904750,5393.0,,3054.0,Under Inspection,Single Family Dwelling,299734.0,New Construction
1,UDG Umatilla LLC *David Mullens*,NONE,V-A,,NEW 4 STORY WOOD FRAME APARTMENT BUILDING WITH...,15-165921-000-00-CO,3628809.0,2015-05-06,2016-01-19 15:40:19,8235-8237 SE 17TH AVE *,...,1S1E23DC 7300,R267583,R752712850,1338.0,,58576.0,Under Inspection,Apartments/Condos (3 or more units),9000000.0,New Construction
2,URBAN VISIONS *KEVIN PARTAIN*,NONE,V-B,,NEW SINGLE FAMILY RESIDENCE/TWO STORY WITH BAS...,14-134384-000-00-RS,3439178.0,2014-04-02,2014-09-03 10:56:44,5009-5011 NE 26TH AVE *,...,1N1E24BB 23300,R186548,R413904750,5393.0,,3054.0,Under Inspection,Single Family Dwelling,299734.0,New Construction
3,CONCEPT DESIGN & ASSOCIATES *KYM NGUYEN*,NONE,V-B,,NEW SINGLE FAMILY RESIDENCE / 3-STORY (INCLUDE...,14-186131-000-00-RS,3491628.0,2014-07-18,2017-02-09 10:58:15,,...,1S1E17CA 9600,R146946,R208002770,,,4941.0,Under Inspection,Single Family Dwelling,344731.0,New Construction
4,JEREMY M ECKERT,NONE,V-B,,NEW DETACHED ACCESSORY DWELLING UNIT/2-STORY/N...,16-198319-000-00-RS,3848149.0,2016-07-01,2017-02-03 10:22:04,,...,1S1E01CA 13500,R269447,R763200190,,,800.0,Issued,Accessory Dwelling Unit,90120.0,New Construction


Now we can use the `to_csv()` method to write out to a new file:

In [51]:
housing.to_csv('portland-developments.csv', index=False)