# Introduction to GeoPanadas

A considerable proportion of the data generated every day is inherently spatial. From Earth Observation data and GPS data to data included in all kinds of maps, spatial data –also known sometimes as geospatial data or geographic information– are data for which a specific location is associated with each record. 


There are many tools suited for geospatial data science. In this notebook, we will focus on how to use the [GeoPandas](https://geopandas.org/en/v0.9.0/getting_started.html) library to work with geospatial data. We will learn how to read, write, and manipulate geospatial data using GeoPandas. We will also learn how to visualize geospatial data using GeoPandas, as well as some basic geospatial operations such as [spatial joins](https://geopandas.org/en/v0.9.0/docs/user_guide/mergingdata.html#spatial-joins).

## Basics of GeoPandas

GeoPandas, as its name suggests, is an extension of the [pandas](https://pandas.pydata.org/pandas-docs/stable/user_guide/dsintro.html) library that helps you work with geospatial data. 

The main data structures in geopandas are GeoSeries and GeoDataFrame which extend the capabilities of [Series](https://pandas.pydata.org/docs/reference/api/pandas.Series.html) and [DataFrames](https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.html) from pandas.

The main difference between geodataframes and pandas dataframes is that a geodataframe should contain one column for geometries. By default, the name of this column is `'geometry'`, which contains the geometries (points, lines, polygons, multipolygons etc.).

[![image from Geopandas website](https://geopandas.org/en/v0.9.0/_images/dataframe.svg)](https://geopandas.org/en/v0.9.0/getting_started/introduction.html)





## Install Geopandas:

To avoid complexities, GeoPandas recommends installing the library using the conda package manager. 

```python
conda install geopandas

```

## Reading spatial data

To read spatial data, GeoPandas comes with the `geopandas.read_file()` function. This function can automatically read most of the occurring vector-based spatial data.

Some of the most common vector data formats are:

- **Shapefile**. As the industry standard, shapefiles are the most common vector data format.
- **GeoJSON**. It’s a newer format for geospatial data released in 2016. Unlike shapefiles, GeoJSON is a single file, making it easier to work with.


### Reading from a shapefile:
```python
import geopandas as gpd

gdf = gpd.read_file('path/to/shapefile.shp')


```


### Reading from a GeoJSON file:

```python
import geopandas as gpd

gdf = gpd.read_file('path/to/geojson.geojson')


```

### Reading from a CSV file:
When reading from a CSV file, you need to create a GeoDataFrame from a DataFrame by specifying the `geometry` column.
Depending on your data, if you have a latitude and longitude column, you can create a geometry column from them using the `gpd.points_from_xy` function.
If you have a WKT column, you can create a geometry column from it using the `gpd.GeoDataFrame` function.
If you have line or polygon geometries, you can create a geometry column from them using the `gpd.GeoDataFrame` function.


For example, if you read a point CSV file with columns `lon` and `lat` and you want to create a GeoDataFrame from it, the geometry column can be created from `lon` and `lat` coordinates using the `gpd.points_from_xy` function.



```python
import geopandas as gpd

gdf = gpd.read_file('path/to/csv.csv')
gdf = gpd.GeoDataFrame(gdf, geometry=gpd.points_from_xy(gdf.lon, gdf.lat))

```


In [1]:
# import GeoPandas
import geopandas as gpd



