# Group Assignment #3: A Clean Notebook
## Focus and Control Bart Stations: Adding a Function and a Loop


For this week's assignment, we are going to clean the California Rail dataset so that it only includes our project's focus stations and control stations. We will also designate marker types for the stations to signify if they are control or not. Additionally, we will create a function that will print out the station name and whether it is a TOD project location or not. We will then loop through this function so that each of our ten stations (five focus and five control) will print out with this information. 

We'll first start by bringing in the necessary libraries and importing my data. Nicole already cleaned the California Rail dataset to get a downloadable variable of just our five focus stations in a previous assignment (week 4). We will load that datset in, but we will need to clean the original dataset again to get a variable that includes our control stations. 

In [None]:
import pandas as pd # this is for regular data analysis
import geopandas as gpd # this is for geospatial analysis
import contextily as ctx # this is for basemaps
import matplotlib.pyplot as plt # this is for plotting



In [None]:
bart = gpd.read_file('Data/California_Rail_Stations.geojson')
bart.head()

Unnamed: 0,OBJECTID,LOCATION,STATION,CODE,ADDRESS,ZIP,PASS_OP,PASS_NETWO,COMM_OP,COMM_NETWO,BUS_ROUTES,TRANSIT,AIRPORT,STATION_TY,INTERMODAL,DIST,CO,geometry
0,1,Parking Garage,MONTEREY - Parking Garage,-,"Tyler, between Del Monte & Franklin",93940,,,,,55,,,2,0,5,MON,POINT (-121.89330 36.60033)
1,2,El Segundo,EL SEGUNDO,ESG,700 South Douglas Street,90245,,,,,1c,,,2,0,7,LA,POINT (-118.38342 33.90512)
2,3,Morgan Hill,MORGAN HILL,MHC,17200 Depot Street,95037,,,PCJPB,Caltrain,55,,,3,1,4,SCL,POINT (-121.65053 37.12966)
3,4,Amtrak Station,OAKLAND COLISEUM,OAC,700 Seventy-third Avenue,94621,Amtrak,Capitol Corridor,,,,"AC Transit,BART",AirBART connector to OAK,6,1,4,ALA,POINT (-122.19820 37.75252)
4,5,Amtrak Station,SANTA BARBARA,SBA,209 State Street,93101,Amtrak,"Coast Starlight,Pacific Surfliner",,,4101721,Santa Barbara MTD,,5,1,5,SB,POINT (-119.69260 34.41430)


In [6]:
tod= gpd.read_file('focus_stations.geojson')
tod.head()

Unnamed: 0,STATION,OBJECTID,LOCATION,CODE,ADDRESS,ZIP,COMM_NETWO,STATION_TY,DIST,CO,geometry
0,ASHBY,215,,,,0,BART,1,4,ALA,POINT (-122.27012 37.85321)
1,MACARTHUR,217,,,,0,BART,1,4,ALA,POINT (-122.26712 37.82871)
2,HAYWARD,227,,,,0,BART,1,4,ALA,POINT (-122.08720 37.67012)
3,SOUTH HAYWARD,228,,,,0,BART,1,4,ALA,POINT (-122.05704 37.63448)
4,FRUITVALE,222,,,,0,BART,1,4,ALA,POINT (-122.22420 37.77456)


Great, everything loaded perfectly. We can immediately see the differences between the two datasets. "bart" has a bunch of columns we don't need and has all the rail stations in California. I'm also noticing that my "tod" variable has missing information. Apparently these stations did not have their location, code, address, or zip code. I'll make a note to clean up the columns so that we just have the station name, objectid, location (I'll fill in the city the station is in), code (I'll fill in an abbreviation of the station name), the comm_netwo, co (for county) and geometry. 

First things first though, I'm going to clean up the large dataset to only have our ten necessary stations.

In [7]:
# getting columns of dataset to start taking them out
bart.columns.to_list()

['OBJECTID',
 'LOCATION',
 'STATION',
 'CODE',
 'ADDRESS',
 'ZIP',
 'PASS_OP',
 'PASS_NETWO',
 'COMM_OP',
 'COMM_NETWO',
 'BUS_ROUTES',
 'TRANSIT',
 'AIRPORT',
 'STATION_TY',
 'INTERMODAL',
 'DIST',
 'CO',
 'geometry']

In [8]:
bart['CO'].value_counts()

LA     38
ALA    30
SM     19
SCL    18
RIV    17
SF     17
SBD    13
CC     12
ORA    11
MON    11
SB      9
MPA     8
SD      8
VEN     8
SJ      7
HUM     6
KIN     5
SON     5
SLO     5
MEN     4
KER     4
        4
PLA     4
SAC     3
SOL     3
ED      3
SCR     2
NAP     2
TUL     2
STA     2
BUT     2
SHA     2
YUB     1
FRE     1
YOL     1
MAD     1
MER     1
NEV     1
TEH     1
SIS     1
Name: CO, dtype: int64

 I found that there are 30 stops in Alameda County. Based on the columns I have in my other dataset, I know which ones I need to keep.

In [9]:
columns_to_keep = ['OBJECTID',
                   'LOCATION',
                   'STATION',
                   'CODE',
                   'ADDRESS',
                   'ZIP',
                   'COMM_NETWO',
                   'STATION_TY',
                   'DIST',
                   'CO',
                   'geometry']

In [10]:
bart2 = bart[columns_to_keep] #I created a new variable bart2 just so the original dataset doesn't get changed completely
bart2.sample(15)

Unnamed: 0,OBJECTID,LOCATION,STATION,CODE,ADDRESS,ZIP,COMM_NETWO,STATION_TY,DIST,CO,geometry
203,204,,CAPITOL,,,0,Caltrain,1,4,SCL,POINT (-121.84184 37.28393)
51,52,Amtrak Station,EMERYVILLE,EMY,5885 Horton St.,94608,,5,4,ALA,POINT (-122.29146 37.84053)
180,181,,22ND ST,,,0,Caltrain,1,4,SF,POINT (-122.39275 37.75736)
277,278,,Montclair,,,0,San Bernardino Line,1,8,SBD,POINT (-117.69567 34.09392)
28,29,Taco Bell,RED BLUFF,RBF,228 Main St.,96080,,2,2,TEH,POINT (-122.23095 40.17270)
121,122,Regional Trans. Center,SANTA ANA,SNA,1000 Santa Ana Blvd.,92701,"Inland Empire-Orange County Line,Orange County...",8,12,ORA,POINT (-117.85600 33.75150)
17,18,Amtrak Station,DUNSMUIR,DUN,5750 Sacramento Ave.,96025,,1,2,SIS,POINT (-122.27056 41.21245)
88,89,Amtrak/Metrolink Station,CHATSWORTH,CWT,10040 Old Depot Plaza Rd.,91311,Ventura County Line,5,7,LA,POINT (-118.59945 34.25292)
267,268,,Pomona (North),,,0,San Bernardino Line,1,7,LA,POINT (-117.75305 34.09357)
256,257,,Carlsbad Poinsettia,,,0,Coaster,1,11,SD,POINT (-117.31904 33.10875)


In [11]:
bart2 = bart2[bart2.CO == 'ALA'] #I'm only wanting to keep the rows that have "ALA" in the CO column
bart2.sample(10)

Unnamed: 0,OBJECTID,LOCATION,STATION,CODE,ADDRESS,ZIP,COMM_NETWO,STATION_TY,DIST,CO,geometry
226,227,,HAYWARD,,,0,BART,1,4,ALA,POINT (-122.08720 37.67012)
215,216,,ROCKRIDGE,,,0,BART,1,4,ALA,POINT (-122.25178 37.84451)
225,226,,CASTRO VALLEY,,,0,BART,1,4,ALA,POINT (-122.07642 37.69072)
53,54,Across Mission Blvd. from Caltrans Park-n-Ride...,FREMONT/MISSION SAN JOSE,FRT,Upper Mission Blvd (Hwy 238) at I-680,94538,,2,4,ALA,POINT (-121.92382 37.53955)
222,223,,COLISEUM/OAKLAND AIRPORT (OAK),,,0,BART,1,4,ALA,POINT (-122.19718 37.75386)
50,51,Amtrak Station,HAYWARD,HAY,"22555 Meekland Ave. at ""A"" St.",94545,,4,4,ALA,POINT (-122.09869 37.66666)
224,225,,BAY FAIR,,,0,BART,1,4,ALA,POINT (-122.12706 37.69757)
48,49,LAVTA Transit Center,LIVERMORE,LIV,2500 Railroad Ave. (near 1st St.),94550,ACE,3,4,ALA,POINT (-121.76750 37.68501)
214,215,,ASHBY,,,0,BART,1,4,ALA,POINT (-122.27012 37.85321)
52,53,BART Station,DUBLIN/PLEASANTON,DBP,I-580 at Hopyard (Mid-Day Parking),94588,BART,3,4,ALA,POINT (-121.89849 37.70247)


In [12]:
bart2['STATION'].value_counts()

HAYWARD                           2
OAKLAND COLISEUM                  1
MACARTHUR                         1
FREMONT                           1
UNION CITY                        1
SOUTH HAYWARD                     1
CASTRO VALLEY                     1
BAY FAIR                          1
SAN LEANDRO                       1
COLISEUM/OAKLAND AIRPORT (OAK)    1
FRUITVALE                         1
LAKE MERRITT                      1
12TH ST/OAKLAND CITY CENTER       1
WEST OAKLAND                      1
19TH STREET/OAKLAND               1
ROCKRIDGE                         1
BERKELEY                          1
ASHBY                             1
DOWNTOWN BERKELEY                 1
NORTH BERKELEY                    1
PLEASANTON                        1
VASCO ROAD                        1
FREMONT/MISSION SAN JOSE          1
DUBLIN/PLEASANTON                 1
EMERYVILLE                        1
OAKLAND                           1
LIVERMORE                         1
FREMONT/CENTERVILLE         

Chaithra and I picked a few stations to focus on sice they were the ones who had TOD projects completed. They are Fruitvale Transit Village (Fruitvale BART-Oakland), Hayward Bart Station (Hayward), MacArthur BART station (Oakland), South Hayward BART station (Hayward), Ashby BART station (Berkeley).

For our control stations (non TOD bart stations) I'm going to pick stations that fall in the same city as our TOD stations and are within two stops of our TOD stations to try to control for any geographical/socio-economic differences. I've decided on North Berkeley (Berkeley), 19th st/Oakland (Oakland), Lake Merrit (Oakland), Bay Fair (Next to Hayward), and Union City (next to S. Hayward).



In [13]:
bart2= bart2.set_index("STATION") #this is to set the station names as the index so it is easier to delete the rows I don't need
bart2.head()

Unnamed: 0_level_0,OBJECTID,LOCATION,CODE,ADDRESS,ZIP,COMM_NETWO,STATION_TY,DIST,CO,geometry
STATION,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1
OAKLAND COLISEUM,4,Amtrak Station,OAC,700 Seventy-third Avenue,94621,,6,4,ALA,POINT (-122.19820 37.75252)
BERKELEY,47,Amtrak Station,BKY,University Ave. & 3rd St.,94710,,4,4,ALA,POINT (-122.30087 37.86742)
FREMONT/CENTERVILLE,48,Amtrak Station,FMT,37260 Fremont Blvd. at Peralta Blvd.,94536,ACE,5,4,ALA,POINT (-122.00735 37.55890)
LIVERMORE,49,LAVTA Transit Center,LIV,2500 Railroad Ave. (near 1st St.),94550,ACE,3,4,ALA,POINT (-121.76750 37.68501)
OAKLAND,50,Amtrak Station,OKJ,245 2nd St.,94607,,5,4,ALA,POINT (-122.27148 37.79415)


I needed to reset the index so I can use the .loc function and just copy and paste the station names without trying to find their original index number. Now I can use the .loc function to isolate the stations I need and create a variable with those stations.

In [14]:
bart2.loc[['ASHBY',
           'MACARTHUR',
           'HAYWARD',
           'SOUTH HAYWARD',
           'FRUITVALE',
           'NORTH BERKELEY',
           '19TH STREET/OAKLAND',
           'BAY FAIR',
          'UNION CITY',
           'LAKE MERRITT']
          ]

Unnamed: 0_level_0,OBJECTID,LOCATION,CODE,ADDRESS,ZIP,COMM_NETWO,STATION_TY,DIST,CO,geometry
STATION,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1
ASHBY,215,,,,0,BART,1,4,ALA,POINT (-122.27012 37.85321)
MACARTHUR,217,,,,0,BART,1,4,ALA,POINT (-122.26712 37.82871)
HAYWARD,51,Amtrak Station,HAY,"22555 Meekland Ave. at ""A"" St.",94545,,4,4,ALA,POINT (-122.09869 37.66666)
HAYWARD,227,,,,0,BART,1,4,ALA,POINT (-122.08720 37.67012)
SOUTH HAYWARD,228,,,,0,BART,1,4,ALA,POINT (-122.05704 37.63448)
FRUITVALE,222,,,,0,BART,1,4,ALA,POINT (-122.22420 37.77456)
NORTH BERKELEY,213,,,,0,BART,1,4,ALA,POINT (-122.28335 37.87406)
19TH STREET/OAKLAND,218,,,,0,BART,1,4,ALA,POINT (-122.26839 37.80808)
BAY FAIR,225,,,,0,BART,1,4,ALA,POINT (-122.12706 37.69757)
UNION CITY,229,,,,0,BART,1,4,ALA,POINT (-122.01715 37.59087)


In [15]:
stations = bart2.loc[['ASHBY',
           'MACARTHUR',
           'HAYWARD',
           'SOUTH HAYWARD',
           'FRUITVALE',
           'NORTH BERKELEY',
           '19TH STREET/OAKLAND',
           'BAY FAIR',
          'UNION CITY',
           'LAKE MERRITT']
          ]
stations.head()

Unnamed: 0_level_0,OBJECTID,LOCATION,CODE,ADDRESS,ZIP,COMM_NETWO,STATION_TY,DIST,CO,geometry
STATION,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1
ASHBY,215,,,,0,BART,1,4,ALA,POINT (-122.27012 37.85321)
MACARTHUR,217,,,,0,BART,1,4,ALA,POINT (-122.26712 37.82871)
HAYWARD,51,Amtrak Station,HAY,"22555 Meekland Ave. at ""A"" St.",94545,,4,4,ALA,POINT (-122.09869 37.66666)
HAYWARD,227,,,,0,BART,1,4,ALA,POINT (-122.08720 37.67012)
SOUTH HAYWARD,228,,,,0,BART,1,4,ALA,POINT (-122.05704 37.63448)


I created a new variable with just our stations of focus. However I see there are two Haywards, so I'll get rid of the one we don't need. 

In [16]:
stations = stations[stations.COMM_NETWO == 'BART']
stations.head(11)

Unnamed: 0_level_0,OBJECTID,LOCATION,CODE,ADDRESS,ZIP,COMM_NETWO,STATION_TY,DIST,CO,geometry
STATION,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1
ASHBY,215,,,,0,BART,1,4,ALA,POINT (-122.27012 37.85321)
MACARTHUR,217,,,,0,BART,1,4,ALA,POINT (-122.26712 37.82871)
HAYWARD,227,,,,0,BART,1,4,ALA,POINT (-122.08720 37.67012)
SOUTH HAYWARD,228,,,,0,BART,1,4,ALA,POINT (-122.05704 37.63448)
FRUITVALE,222,,,,0,BART,1,4,ALA,POINT (-122.22420 37.77456)
NORTH BERKELEY,213,,,,0,BART,1,4,ALA,POINT (-122.28335 37.87406)
19TH STREET/OAKLAND,218,,,,0,BART,1,4,ALA,POINT (-122.26839 37.80808)
BAY FAIR,225,,,,0,BART,1,4,ALA,POINT (-122.12706 37.69757)
UNION CITY,229,,,,0,BART,1,4,ALA,POINT (-122.01715 37.59087)
LAKE MERRITT,221,,,,0,BART,1,4,ALA,POINT (-122.26554 37.79768)


Now I'm going to reset the index. This will help me later when I'm using my function and loop. I will also refine my columns so that I only have the necessary ones. 

In [17]:
stations = stations.reset_index()
stations.head()

Unnamed: 0,STATION,OBJECTID,LOCATION,CODE,ADDRESS,ZIP,COMM_NETWO,STATION_TY,DIST,CO,geometry
0,ASHBY,215,,,,0,BART,1,4,ALA,POINT (-122.27012 37.85321)
1,MACARTHUR,217,,,,0,BART,1,4,ALA,POINT (-122.26712 37.82871)
2,HAYWARD,227,,,,0,BART,1,4,ALA,POINT (-122.08720 37.67012)
3,SOUTH HAYWARD,228,,,,0,BART,1,4,ALA,POINT (-122.05704 37.63448)
4,FRUITVALE,222,,,,0,BART,1,4,ALA,POINT (-122.22420 37.77456)


In [18]:
list(stations)

['STATION',
 'OBJECTID',
 'LOCATION',
 'CODE',
 'ADDRESS',
 'ZIP',
 'COMM_NETWO',
 'STATION_TY',
 'DIST',
 'CO',
 'geometry']

In [19]:
keep = ['STATION',
        'LOCATION',
        'CODE',
        'COMM_NETWO',
        'CO',
        'geometry']

In [20]:
stations = stations[keep]
stations.head(10)

Unnamed: 0,STATION,LOCATION,CODE,COMM_NETWO,CO,geometry
0,ASHBY,,,BART,ALA,POINT (-122.27012 37.85321)
1,MACARTHUR,,,BART,ALA,POINT (-122.26712 37.82871)
2,HAYWARD,,,BART,ALA,POINT (-122.08720 37.67012)
3,SOUTH HAYWARD,,,BART,ALA,POINT (-122.05704 37.63448)
4,FRUITVALE,,,BART,ALA,POINT (-122.22420 37.77456)
5,NORTH BERKELEY,,,BART,ALA,POINT (-122.28335 37.87406)
6,19TH STREET/OAKLAND,,,BART,ALA,POINT (-122.26839 37.80808)
7,BAY FAIR,,,BART,ALA,POINT (-122.12706 37.69757)
8,UNION CITY,,,BART,ALA,POINT (-122.01715 37.59087)
9,LAKE MERRITT,,,BART,ALA,POINT (-122.26554 37.79768)


Now I'm going to fill in the Location and Code. I'll do this by using .loc again

In [21]:
stations.loc[0:9,['LOCATION']] = ['Berkeley',
                                  'Oakland',
                                  'Hayward',
                                 'Hayward',
                                 'Oakland',
                                 'Berkeley',
                                 'Oakland',
                                 'San Leandro',
                                 'Union City',
                                 'Oakland']

In [22]:
stations.head(10)

Unnamed: 0,STATION,LOCATION,CODE,COMM_NETWO,CO,geometry
0,ASHBY,Berkeley,,BART,ALA,POINT (-122.27012 37.85321)
1,MACARTHUR,Oakland,,BART,ALA,POINT (-122.26712 37.82871)
2,HAYWARD,Hayward,,BART,ALA,POINT (-122.08720 37.67012)
3,SOUTH HAYWARD,Hayward,,BART,ALA,POINT (-122.05704 37.63448)
4,FRUITVALE,Oakland,,BART,ALA,POINT (-122.22420 37.77456)
5,NORTH BERKELEY,Berkeley,,BART,ALA,POINT (-122.28335 37.87406)
6,19TH STREET/OAKLAND,Oakland,,BART,ALA,POINT (-122.26839 37.80808)
7,BAY FAIR,San Leandro,,BART,ALA,POINT (-122.12706 37.69757)
8,UNION CITY,Union City,,BART,ALA,POINT (-122.01715 37.59087)
9,LAKE MERRITT,Oakland,,BART,ALA,POINT (-122.26554 37.79768)


Great! Now I'm going to fill in the code column so that it has an abbreviation of the location.

In [23]:
stations.loc[stations.LOCATION == "Berkeley", "CODE"] = "BKY"
stations.loc[stations.LOCATION == "Oakland", "CODE"] = "OAK"
stations.loc[stations.LOCATION == "Hayward", "CODE"] = "HAY"
stations.loc[stations.LOCATION == "Union City", "CODE"] = "UNC"
stations.loc[stations.LOCATION == "San Leandro", "CODE"] = "SAN"
stations.head(10)

Unnamed: 0,STATION,LOCATION,CODE,COMM_NETWO,CO,geometry
0,ASHBY,Berkeley,BKY,BART,ALA,POINT (-122.27012 37.85321)
1,MACARTHUR,Oakland,OAK,BART,ALA,POINT (-122.26712 37.82871)
2,HAYWARD,Hayward,HAY,BART,ALA,POINT (-122.08720 37.67012)
3,SOUTH HAYWARD,Hayward,HAY,BART,ALA,POINT (-122.05704 37.63448)
4,FRUITVALE,Oakland,OAK,BART,ALA,POINT (-122.22420 37.77456)
5,NORTH BERKELEY,Berkeley,BKY,BART,ALA,POINT (-122.28335 37.87406)
6,19TH STREET/OAKLAND,Oakland,OAK,BART,ALA,POINT (-122.26839 37.80808)
7,BAY FAIR,San Leandro,SAN,BART,ALA,POINT (-122.12706 37.69757)
8,UNION CITY,Union City,UNC,BART,ALA,POINT (-122.01715 37.59087)
9,LAKE MERRITT,Oakland,OAK,BART,ALA,POINT (-122.26554 37.79768)


It worked! In the code I used above, I used the .loc function to change values in the Code column if they were equal to some value specific value in the Location column. So if Location= Berkeley, the code column would update any value that matched the location command. This saves me time because there are multiple duplicates in the dataset.

Now I'm going to make a new column to signify if that particular station is a TOD project location or not. For simplicities sake, I'm going to just rename the comm_netwo column and add the values in there. I'm realizing that column really isn't that useful as it is. Plus, this way I can rename all the columns to make them better anyway.

In [24]:
stations.columns = ['Station',
                    'City',
                    'Code',
                    'TOD',
                    'County',
                    'geometry']
stations.head(10)

Unnamed: 0,Station,City,Code,TOD,County,geometry
0,ASHBY,Berkeley,BKY,BART,ALA,POINT (-122.27012 37.85321)
1,MACARTHUR,Oakland,OAK,BART,ALA,POINT (-122.26712 37.82871)
2,HAYWARD,Hayward,HAY,BART,ALA,POINT (-122.08720 37.67012)
3,SOUTH HAYWARD,Hayward,HAY,BART,ALA,POINT (-122.05704 37.63448)
4,FRUITVALE,Oakland,OAK,BART,ALA,POINT (-122.22420 37.77456)
5,NORTH BERKELEY,Berkeley,BKY,BART,ALA,POINT (-122.28335 37.87406)
6,19TH STREET/OAKLAND,Oakland,OAK,BART,ALA,POINT (-122.26839 37.80808)
7,BAY FAIR,San Leandro,SAN,BART,ALA,POINT (-122.12706 37.69757)
8,UNION CITY,Union City,UNC,BART,ALA,POINT (-122.01715 37.59087)
9,LAKE MERRITT,Oakland,OAK,BART,ALA,POINT (-122.26554 37.79768)


In [25]:
stations.loc[0:4,['TOD']] = ['Yes'] #this changes all the values from 0-4 to Yes in the TOD column because all of these are TOD
stations.loc[5:9,['TOD']] = ['No'] #the rest are not TOD

In [26]:
stations.head(10)

Unnamed: 0,Station,City,Code,TOD,County,geometry
0,ASHBY,Berkeley,BKY,Yes,ALA,POINT (-122.27012 37.85321)
1,MACARTHUR,Oakland,OAK,Yes,ALA,POINT (-122.26712 37.82871)
2,HAYWARD,Hayward,HAY,Yes,ALA,POINT (-122.08720 37.67012)
3,SOUTH HAYWARD,Hayward,HAY,Yes,ALA,POINT (-122.05704 37.63448)
4,FRUITVALE,Oakland,OAK,Yes,ALA,POINT (-122.22420 37.77456)
5,NORTH BERKELEY,Berkeley,BKY,No,ALA,POINT (-122.28335 37.87406)
6,19TH STREET/OAKLAND,Oakland,OAK,No,ALA,POINT (-122.26839 37.80808)
7,BAY FAIR,San Leandro,SAN,No,ALA,POINT (-122.12706 37.69757)
8,UNION CITY,Union City,UNC,No,ALA,POINT (-122.01715 37.59087)
9,LAKE MERRITT,Oakland,OAK,No,ALA,POINT (-122.26554 37.79768)


Great! Now I'm going to make a column "marker" column so that we can distinguish between the TOD and non TOD stations on a map. 

In [27]:
stations['Marker']= "x"
stations.head()

Unnamed: 0,Station,City,Code,TOD,County,geometry,Marker
0,ASHBY,Berkeley,BKY,Yes,ALA,POINT (-122.27012 37.85321),x
1,MACARTHUR,Oakland,OAK,Yes,ALA,POINT (-122.26712 37.82871),x
2,HAYWARD,Hayward,HAY,Yes,ALA,POINT (-122.08720 37.67012),x
3,SOUTH HAYWARD,Hayward,HAY,Yes,ALA,POINT (-122.05704 37.63448),x
4,FRUITVALE,Oakland,OAK,Yes,ALA,POINT (-122.22420 37.77456),x


In [28]:
stations.loc[stations.TOD == "Yes", "Marker"] = "D" #marker symbol of a diamond
stations.loc[stations.TOD == "No", "Marker"] = "o" #marker symbol of a circle
stations.head()

Unnamed: 0,Station,City,Code,TOD,County,geometry,Marker
0,ASHBY,Berkeley,BKY,Yes,ALA,POINT (-122.27012 37.85321),D
1,MACARTHUR,Oakland,OAK,Yes,ALA,POINT (-122.26712 37.82871),D
2,HAYWARD,Hayward,HAY,Yes,ALA,POINT (-122.08720 37.67012),D
3,SOUTH HAYWARD,Hayward,HAY,Yes,ALA,POINT (-122.05704 37.63448),D
4,FRUITVALE,Oakland,OAK,Yes,ALA,POINT (-122.22420 37.77456),D


In [29]:
stations.to_file("stations.geojson", driver='GeoJSON')

In the code above I saved the geodataframe so that I can send it to Chaithra and use it for later projects.

Now let us create fuctions to display if a particualr station is TOD or not. We will do this step-by-step. First a function to see how this might work.

In [30]:
def TOD(location):
    print(location + ' ' + 'is a TOD station in Alameda County California.')

In [31]:
TOD_stations = ['Ashby', 'MacArthur', 'Hayward', 'South Hayward', 'Fruitvale'] 

In [32]:
for station in TOD_stations:
    TOD(location=station)


Ashby is a TOD station in Alameda County California.
MacArthur is a TOD station in Alameda County California.
Hayward is a TOD station in Alameda County California.
South Hayward is a TOD station in Alameda County California.
Fruitvale is a TOD station in Alameda County California.


That works perfectly. Now a little bit more refined fuction. Here we will input index for a row and then check if station is TOD or not. We will then call this fuction via a loop. 

In [33]:
def TODcheck(s):
    if stations.iloc[s] ["TOD"]=="Yes":
        print(stations.iloc[s,0],"is a TOD Station.")
    else:
        print(stations.iloc[s,0],"is not a TOD Station.")

In [34]:
for i in stations.index:
    TODcheck(i)
    

ASHBY is a TOD Station.
MACARTHUR is a TOD Station.
HAYWARD is a TOD Station.
SOUTH HAYWARD is a TOD Station.
FRUITVALE is a TOD Station.
NORTH BERKELEY is not a TOD Station.
19TH STREET/OAKLAND is not a TOD Station.
BAY FAIR is not a TOD Station.
UNION CITY is not a TOD Station.
LAKE MERRITT is not a TOD Station.


This gives TOD status of each station. 

In [35]:
stations.head(10)

Unnamed: 0,Station,City,Code,TOD,County,geometry,Marker
0,ASHBY,Berkeley,BKY,Yes,ALA,POINT (-122.27012 37.85321),D
1,MACARTHUR,Oakland,OAK,Yes,ALA,POINT (-122.26712 37.82871),D
2,HAYWARD,Hayward,HAY,Yes,ALA,POINT (-122.08720 37.67012),D
3,SOUTH HAYWARD,Hayward,HAY,Yes,ALA,POINT (-122.05704 37.63448),D
4,FRUITVALE,Oakland,OAK,Yes,ALA,POINT (-122.22420 37.77456),D
5,NORTH BERKELEY,Berkeley,BKY,No,ALA,POINT (-122.28335 37.87406),o
6,19TH STREET/OAKLAND,Oakland,OAK,No,ALA,POINT (-122.26839 37.80808),o
7,BAY FAIR,San Leandro,SAN,No,ALA,POINT (-122.12706 37.69757),o
8,UNION CITY,Union City,UNC,No,ALA,POINT (-122.01715 37.59087),o
9,LAKE MERRITT,Oakland,OAK,No,ALA,POINT (-122.26554 37.79768),o
