# Abandoned Vehicles as Points

by Luc Anselin (anselin@uchicago.edu) (8/22/2016)

Creating a csv file with the latitude and longitude of abandoned vehicles in Chicago. Data from the 
Chicago open data portal.

Note: this is written with R beginners in mind, more seasoned R users can probably skip most of the comments.

For more extensive details about each function, see the R (or RStudio) help files.

Packages used:

- **lubridate**

### Read the csv file and turn into a data frame

- use the command **read.csv**

In [1]:
vehicall <- read.csv("Abandoned_Vehicles_Map.csv")

Check on the variable names (column headings) and check on the first few lines of the table, just to make sure all is well.

- use **names** to get the variable names

- use **head** to list the first few lines

In [2]:
names(vehicall)

In [3]:
head(vehicall)

Creation.Date,Status,Completion.Date,Service.Request.Number,Type.of.Service.Request,License.Plate,Vehicle.Make.Model,Vehicle.Color,Current.Activity,Most.Recent.Action,⋯,Street.Address,ZIP.Code,X.Coordinate,Y.Coordinate,Ward,Police.District,Community.Area,Latitude,Longitude,Location
01/01/2011,Completed - Dup,01/07/2011,11-00002767,Abandoned Vehicle Complaint,0000000000,Jeep/Cherokee,Red,,,⋯,5629 N KEDVALE AVE,60646,1147717,1937054,39,17,13,41.98368,-87.73197,"(41.983680361597564, -87.7319663736746)"
01/01/2011,Completed - Dup,01/07/2011,11-00002779,Abandoned Vehicle Complaint,REAR PLATE STARTS W/848 AND FRONT PLATE STARTS W/ K,Isuzu,Red,,,⋯,5629 N KEDVALE AVE,60646,1147717,1937054,39,17,13,41.98368,-87.73197,"(41.983680361597564, -87.7319663736746)"
01/01/2011,Completed - Dup,01/20/2011,11-00003001,Abandoned Vehicle Complaint,9381880,Toyota,Silver,,,⋯,2053 N KILBOURN AVE,60639,1146056,1913269,31,25,20,41.91859,-87.73868,"(41.91858774162382, -87.73868431751842)"
01/01/2011,Completed - Dup,01/21/2011,11-00003309,Abandoned Vehicle Complaint,MI S CS860,Jeep/Cherokee,Gold,,,⋯,736 W BUENA AVE,60613,1170576,1928214,46,23,3,41.95861,-87.64888,"(41.95860696269331, -87.64887590959788)"
01/01/2011,Completed - Dup,01/21/2011,11-00003316,Abandoned Vehicle Complaint,MI SCS860,,Gold,,,⋯,736 W BUENA AVE,60613,1170576,1928214,46,23,3,41.95861,-87.64888,"(41.95860696269331, -87.64887590959788)"
01/01/2011,Completed,01/05/2011,11-00001976,Abandoned Vehicle Complaint,H924236,Ford,White,,,⋯,6059 S KOMENSKY AVE,60629,1150408,1864110,13,8,65,41.78237,-87.72394,"(41.78237428405976, -87.72394038021173)"


### Create year and month variables as numeric

Create a new variable **credate** that converts the **Creation.Date** to an R **date** format, which can then be easily manipulated by specialized packages. We will use **lubridate** which has very convenient functions **year** and **month**. These commands extract the corresponding items from a **Date** format and turn them into a numeric value. This package also has many other convenient data manipulation functions, but those are beyond our scope here.

- first we create the variable in a data format using **as.Date**, we specify the variable as **Creation.Date**
and the format as **%m/%d/%Y** given what is used in the initial file (month/day/year in four digits) 

- if you are unfamiliar with R, note the use of the dollar sign to specify a variable in a given data frame; it may seem strange at first, but you will quickly get used to it

- again, we make sure all works as expected by listing the first few lines using **head**

In [4]:
vehicall$credate <- as.Date(vehicall$Creation.Date,"%m/%d/%Y")
head(vehicall$credate)

[1] "2011-01-01" "2011-01-01" "2011-01-01" "2011-01-01" "2011-01-01"
[6] "2011-01-01"

Make the **lubridate** package active with the **library** command (make sure the package is installed, if not, install it with **install.packages**)

In [5]:
library(lubridate)


Attaching package: ‘lubridate’

The following object is masked from ‘package:base’:

    date



We create two new variables, one for the year and one for the month by using the respective functions **year** and **month** from the **lubridate** package. Again, we will use **head** to check that all is OK.

In [6]:
vehicall$year <- year(vehicall$credate)
head(vehicall$year)

In [7]:
vehicall$month <- month(vehicall$credate)
head(vehicall$month)

### Select observations for a given month/year

Now, we will use the very powerful R **[ , ]** subsetting commands to extract only those observations that match the
year and month criteria. For example, to select the observations (rows) for September 2015, we use **year == 2015**
and **month == 9**. We put each statement in parentheses for clarity, but that is not necessary. Make sure not to 
forget the **","** followed by space before the closing bracket: this ensures that all the variables (columns) are
selected.

Again, we use **head** to see the first lines. In addition, we also use the **dim** command to show the dimensions of the resulting data frame. 

In [8]:
abandon_15_9 <- vehicall[ (vehicall$year == 2015) & (vehicall$month == 9), ]
head(abandon_15_9)
dim(abandon_15_9)

Unnamed: 0,Creation.Date,Status,Completion.Date,Service.Request.Number,Type.of.Service.Request,License.Plate,Vehicle.Make.Model,Vehicle.Color,Current.Activity,Most.Recent.Action,⋯,Y.Coordinate,Ward,Police.District,Community.Area,Latitude,Longitude,Location,credate,year,month
107034,09/01/2015,Completed,09/01/2015,15-04497171,Abandoned Vehicle Complaint,E671293,Chevrolet,Black,FVI - Outcome,Return to Owner - Vehicle,⋯,,3,9,37,41.79584,-87.63289,"(41.7958353436477, -87.63288785584677)",2015-09-01,2015,9
107035,09/01/2015,Completed,09/01/2015,15-04499490,Abandoned Vehicle Complaint,UNKNOWN,Jeep/Cherokee,Gray,FVI - Outcome,Vehicle was moved from original address requested,⋯,,49,24,1,42.0145,-87.67822,"(42.01449699912976, -87.67822416417681)",2015-09-01,2015,9
107036,09/01/2015,Completed,09/01/2015,15-04501261,Abandoned Vehicle Complaint,,(Unlisted Make),Black,FVI - Outcome,Return to Owner - Vehicle,⋯,,27,12,28,41.88544,-87.66681,"(41.88544236574685, -87.66680842588522)",2015-09-01,2015,9
107119,09/02/2015,Completed - Dup,09/02/2015,15-04520641,Abandoned Vehicle Complaint,9856420,Honda,Blue,FVI - Outcome,Create Work Order,⋯,1877350.0,4,2,36,41.81816,-87.60088,"(41.81816098393261, -87.6008847868268)",2015-09-02,2015,9
107120,09/02/2015,Completed - Dup,09/02/2015,15-04522485,Abandoned Vehicle Complaint,6401667,Buick,Cream,FVI - Outcome,Create Work Order,⋯,1858880.0,5,3,43,41.7678,-87.58582,"(41.76779762303839, -87.58581991823232)",2015-09-02,2015,9
107121,09/02/2015,Completed - Dup,09/02/2015,15-04525443,Abandoned Vehicle Complaint,UNKNOWN,Buick,Blue,FVI - Outcome,Create Work Order,⋯,1905690.0,26,12,24,41.89728,-87.68703,"(41.897281897854185, -87.68702748042514)",2015-09-02,2015,9


In this example, there are 2220 abandoned vehicles with a time stamp of September 2015, and 24 variables in the data frame.

### Select desired variables

We will select only a few of the variables. Besides the data stamps, we will keep the **Ward**, **Police.District**,
**Comunity.Area**, **Latitude** and **Longitude**.

First, we create a vector that contains all these variable names (using the omnipresent **c** command). Note how
the variables names have to be in quotes (they are string variables).

In [9]:
vehvariables <- c("year","month","credate","Ward","Police.District","Community.Area","Latitude","Longitude")

Now, we do the column selection using the **[ , ]** subsetting of the data frame. In order to use the variable
names as listed in **vehvariables** (and to avoid having to type the data frame name with a dollar symbol for
each variable), we use the **with** command. This specifies the data frame on which the commands are to operate.
It is an alternative to the more traditional **attach** and **detach** convention. You specify the name of
the data frame to be used, followed by the command, in this case the subsetting. Again, note the importance
of having the comma with an empty space in front of it to specify that all rows are to be selected.

In [10]:
veh2015_9 <- with(abandon_15_9,abandon_15_9[,vehvariables])

In [11]:
head(veh2015_9)
dim(veh2015_9)

Unnamed: 0,year,month,credate,Ward,Police.District,Community.Area,Latitude,Longitude
107034,2015,9,2015-09-01,3,9,37,41.79584,-87.63289
107035,2015,9,2015-09-01,49,24,1,42.0145,-87.67822
107036,2015,9,2015-09-01,27,12,28,41.88544,-87.66681
107119,2015,9,2015-09-02,4,2,36,41.81816,-87.60088
107120,2015,9,2015-09-02,5,3,43,41.7678,-87.58582
107121,2015,9,2015-09-02,26,12,24,41.89728,-87.68703


Now, there are only 8 variables left out of the original 24.

### Write a new csv file

We use the **write.csv** command, specifying the data frame and its file name. Note the importance ot setting
**row.names = FALSE** or else the original sequence numbers are added to the output file (those numbers appearing
in the left-most column of the data frame listing above). We will add our own identifier later in **GeoDa**.

Note: if a file with the specified file name already exists, it will be over-written.

In [12]:
write.csv(veh2015_9,"vehicles2015_9.csv",row.names=FALSE)