# GIS Data Formats


Because vector spatial data includes geometric objects and projection meta-data, it generally can't easily be stored in normal tabular formats like csvs. Instead, spatial data is generally stored in two main formats: shapefiles, and GeoJSON files. 

## Shapefiles 

Shapefiles are a very old, very simple format, and are kind of like the csvs of spatial data. It's probably the most commonly passed around vector data format.

Shapefiles actually consistent of several files, all the with the same names but different suffixes: for example, a shapefile of data on counties might consistent of the following set of files: `counties.shp`, `counties.dbf`, `counties.prj`, `counties.shx`, etc. All shapefiles will contain a `.shp` file, but after that bets on off on the number of files and the suffixes they will include.

To load a shapefile with geopandas, simply place all the shapefile files in a single folder and point the `gpd.read_file()` function at the `.shp` file -- geopandas will do the work of looking for other files with the same name in the same folder. 

Similarly, you can write a geodataframe (`gdf`) to a shapefile with the command `gdf.to_file("my_shapefile.shp")`. You'll just find that more than one file has been created.

The one thing to be aware of about shapefiles is that they have some restrictions on column names, so long column names or names with punctuation will get truncated on write. 

## GeoJSON

An increasingly popular new format for spatial data is GeoJSON. Unlike shapefiles, GeoJSON files are a single file with a `.geojson` file suffix. Geopandas can read GeoJSON files with `gpd.read_file()` (same function as above -- geopandas will check the file suffix to determine if the file being read is a shapefile or GeoJSON), and write them with `gdf.to_file("my_geojson.geojson", driver="GeoJSON")`. 

## CSVs for Points

The one place where normal data formats may be used for spatial data is when dealing with points, since a point is fully specified by a single x-coordinate and a single y-coordinate. As a result, you may often find that point data comes to you in the form of a CSV. When you get this kind of data, the two columns of a normal dataset can be turned into a GeoDataFrame easily with the `gpd.points_from_xy()` function, something we'll talk about in our readings on projections. 

## Other Formats

There are some other formats in the world, such as [GeoPackages](https://en.wikipedia.org/wiki/GeoPackage), which are also supported by geopandas. In general, though, I'll admit I haven't really seen them in the wild...