# Using Geopandas to Read and Write Vector Data

<img width="20%" src="https://geopandas.readthedocs.io/en/latest/_static/geopandas_logo_web.svg"></img>


GeoPandas is a popular Python library designed to simplify working with geospatial data in Python by extending the capabilities of the pandas library. It provides data structures and functions needed to manipulate and analyze geospatial data, such as points, lines, and polygons, and to perform various spatial operations, like spatial joins, overlays, and projections.

GeoPandas builds upon several core Python libraries, including pandas, Shapely, Fiona, and pyproj. These dependencies provide the underlying functionality for handling geospatial data structures, file I/O, and coordinate transformations.

http://geopandas.readthedocs.io




We cam import geopandas. Most developers import it as "gpd" to type less.

In [1]:
import geopandas as gpd

## Example 1: Reading Natural Earth Dataset

In the previous lecture we downloaded the natural earth vector dataset and looked at the airports and countries datasets. 

Let's do that again but this time using GeoPandas

In [2]:
filename = "geodata/packages/natural_earth_vector.gpkg"

airports = gpd.read_file(filename, layer="ne_10m_airports")

the variable "airports" is a "geopandas data frame". We can just display it in jupyter lab:

The last column is "geometry" and contains the geometry. Compared to fiona this is really easy!

In [3]:
airports

Unnamed: 0,scalerank,featurecla,type,name,abbrev,location,gps_code,iata_code,wikipedia,natlscale,...,name_vi,name_zh,wdid_score,ne_id,name_fa,name_he,name_uk,name_ur,name_zht,geometry
0,9,Airport,small,Sahnewal,LUH,terminal,VILD,LUH,http://en.wikipedia.org/wiki/Sahnewal_Airport,8.0,...,,,4.0,1159113785,فرودگاه سهنول,,,,,POINT (75.95707 30.85036)
1,9,Airport,mid,Solapur,SSE,terminal,VASL,SSE,http://en.wikipedia.org/wiki/Solapur_Airport,8.0,...,,,4.0,1159113803,فرودگاه سولاپور,,,,,POINT (75.93306 17.62542)
2,9,Airport,mid,Birsa Munda,IXR,terminal,VERC,IXR,http://en.wikipedia.org/wiki/Birsa_Munda_Airport,8.0,...,Sân bay Birsa Munda,比尔萨·蒙达机场,4.0,1159113831,فرودگاه بیرسا موندا,,,,比爾薩·蒙達機場,POINT (85.32360 23.31772)
3,9,Airport,mid,Ahwaz,AWZ,terminal,OIAW,AWZ,http://en.wikipedia.org/wiki/Ahwaz_Airport,8.0,...,Sân bay Ahvaz,阿瓦士机场,4.0,1159113845,فرودگاه بین المللی اهواز,,,,阿瓦士機場,POINT (48.74711 31.34316)
4,9,Airport,mid and military,Gwalior,GWL,terminal,VIGR,GWL,http://en.wikipedia.org/wiki/Gwalior_Airport,8.0,...,Sân bay Gwalior,辛迪亚航空站,4.0,1159113863,فرودگاه گوالیور,,,,辛迪亞航空站,POINT (78.21722 26.28549)
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
888,2,Airport,major,Arlanda,ARN,terminal,ESSA,ARN,http://en.wikipedia.org/wiki/Stockholm-Arlanda...,150.0,...,Sân bay Stockholm-Arlanda,斯德哥尔摩－阿兰达机场,4.0,1159127877,فرودگاه استکهلم-آرلاندا,נמל התעופה סטוקהולם ארלנדה,Стокгольм-Арланда,اسٹاک ہوم ارلینڈا ہوائی اڈا,斯德哥爾摩－阿蘭達機場,POINT (17.93073 59.65112)
889,2,Airport,major,Soekarno-Hatta Int'l,CGK,parking,WIII,CGK,http://en.wikipedia.org/wiki/Soekarno-Hatta_In...,150.0,...,Sân bay quốc tế Soekarno-Hatta,苏加诺－哈达国际机场,4.0,1159127891,فرودگاه بینالمللی سوئکارنو-هتا,נמל התעופה סואקרנו-האטה,Сукарно-Хатта,سوکارنو-ہاتا بین الاقوامی ہوائی اڈا,蘇加諾－哈達國際機場,POINT (106.65430 -6.12660)
890,2,Airport,major,Eleftherios Venizelos Int'l,ATH,terminal,LGAV,ATH,http://en.wikipedia.org/wiki/Athens_Internatio...,150.0,...,Sân bay quốc tế Athena,雅典埃莱夫塞里奥斯·韦尼泽洛斯国际机场,2.0,1159127903,فرودگاه بینالمللی آتن,נמל התעופה הבינלאומי אתונה-אלפתריוס וניזלוס,Міжнародний аеропорт «Елефтеріос Венізелос»,,雅典埃萊夫塞里奧斯·韋尼澤洛斯國際機場,POINT (23.94712 37.93623)
891,2,Airport,major,Tokyo Int'l,HND,terminal,RJTT,HND,https://en.wikipedia.org/wiki/Haneda_Airport,150.0,...,Sân bay quốc tế Tokyo,東京國際機場,,1729942773,فرودگاه هانهدا,נמל התעופה טוקיו האנדה,Міжнародний аеропорт Токіо,ہانیدا ہوائی اڈا,東京國際機場,POINT (139.78405 35.54906)


There are 41 columns and 893 rows. The geodataframe has an attribute "shape", where we can also get this information:

In [4]:
airports.shape

(893, 41)

We can create a new dataframe with less rows by just telling which rows we want. We should always keep the **geometry** row.

In [5]:
airports2 = airports[['scalerank', 'type', 'name','iata_code', 'geometry']] 

In [6]:
airports2

Unnamed: 0,scalerank,type,name,iata_code,geometry
0,9,small,Sahnewal,LUH,POINT (75.95707 30.85036)
1,9,mid,Solapur,SSE,POINT (75.93306 17.62542)
2,9,mid,Birsa Munda,IXR,POINT (85.32360 23.31772)
3,9,mid,Ahwaz,AWZ,POINT (48.74711 31.34316)
4,9,mid and military,Gwalior,GWL,POINT (78.21722 26.28549)
...,...,...,...,...,...
888,2,major,Arlanda,ARN,POINT (17.93073 59.65112)
889,2,major,Soekarno-Hatta Int'l,CGK,POINT (106.65430 -6.12660)
890,2,major,Eleftherios Venizelos Int'l,ATH,POINT (23.94712 37.93623)
891,2,major,Tokyo Int'l,HND,POINT (139.78405 35.54906)


we can also call .head(n) to display the first n entries:

In [10]:
airports2.head(10)

Unnamed: 0,scalerank,type,name,iata_code,geometry
0,9,small,Sahnewal,LUH,POINT (75.95707 30.85036)
1,9,mid,Solapur,SSE,POINT (75.93306 17.62542)
2,9,mid,Birsa Munda,IXR,POINT (85.32360 23.31772)
3,9,mid,Ahwaz,AWZ,POINT (48.74711 31.34316)
4,9,mid and military,Gwalior,GWL,POINT (78.21722 26.28549)
5,9,mid,Hodeidah Int'l,HOD,POINT (42.97110 14.75525)
6,9,mid,Devi Ahilyabai Holkar Int'l,IDR,POINT (75.80929 22.72775)
7,9,mid,Gandhinagar,ISK,POINT (73.81057 19.96602)
8,9,major and military,Chandigarh Int'l,IXC,POINT (76.80173 30.67072)
9,9,mid,Aurangabad,IXU,POINT (75.39584 19.86730)


### Sorting

Sorting is very easy. You must specify which column is sorted

In [11]:
airports2.sort_values(by="name", ascending=True)

Unnamed: 0,scalerank,type,name,iata_code,geometry
448,6,mid,Aba Tenna D. Yilma Int'l,DIR,POINT (41.85776 9.61268)
21,9,mid and military,Abdul Rachman Saleh,MLG,POINT (112.71142 -7.92998)
626,4,mid,Abidjan Port Bouet,ABJ,POINT (-3.93222 5.25440)
554,6,major,Abu Dhabi Int'l,AUH,POINT (54.64633 24.42723)
565,5,major,Abuja Int'l,ABV,POINT (7.27026 9.00438)
...,...,...,...,...,...
308,7,major,Zvartnots Int'l,EVN,POINT (44.40006 40.15237)
381,7,major,Ürümqi Diwopu Int'l,URC,POINT (87.46713 43.89834)
231,8,mid,Łódź Władysław Reymont,LCJ,POINT (19.40321 51.72721)
49,8,major,Şakirpaşa,ADA,POINT (35.29696 36.98521)


### Queries

We can do some queries:

In [12]:
airports2.query("scalerank == 2")

Unnamed: 0,scalerank,type,name,iata_code,geometry
828,2,major,Hong Kong Int'l,HKG,POINT (113.93502 22.31533)
829,2,major,Taoyuan,TPE,POINT (121.23137 25.07674)
830,2,major,Schiphol,AMS,POINT (4.76438 52.30893)
831,2,major,Singapore Changi,SIN,POINT (103.98641 1.35616)
832,2,major,London Heathrow,LHR,POINT (-0.45316 51.47100)
...,...,...,...,...,...
888,2,major,Arlanda,ARN,POINT (17.93073 59.65112)
889,2,major,Soekarno-Hatta Int'l,CGK,POINT (106.65430 -6.12660)
890,2,major,Eleftherios Venizelos Int'l,ATH,POINT (23.94712 37.93623)
891,2,major,Tokyo Int'l,HND,POINT (139.78405 35.54906)


In [13]:
airports2.query("iata_code == 'ZRH'  ")

Unnamed: 0,scalerank,type,name,iata_code,geometry
823,3,major,Zurich Int'l,ZRH,POINT (8.56221 47.45239)


## Example 2: Creating a Geopandas dataframe manually and exporting it

Let's assume we have some data and want to create a geopandas dataframe from scratch.

We have a list of mountain peaks in Switzerland in the format [latitude, longitude, name, elevation]

In [None]:
data = [[45.922513343092916, 7.835574679184418,'Liskamm',4527],
[45.941997570720375, 7.869820276613906,'Nordend',4609],
[46.10902325837147, 7.863895545667632,'Nadelhorn',4327],
[45.932186337151684, 7.8714190183674555,'Zumsteinspitze',4563],
[46.08336532442726, 7.857296913890337,'Täschhorn',4491],
[45.91669904679932, 7.863563975062021,'Ludwigshöhe',4341],
[45.93756139078208, 7.299279971077615,'Grand Combin de Grafeneire',4314],
[45.922513343092916, 7.835574679184418,'Lyskamm',4527],
[45.93683662540408, 7.866814344981748,'Dufourspitze (Pointe Dufour)',4634],
[46.10129664518156, 7.716156885858494,'Weisshorn',4506],
[45.976340506120614, 7.658691510512221,'Monte Cervino',4478],
[45.976340506120614, 7.658691510512221,'Matterhorn',4478],
[45.93674004101607, 7.86855410887458,'Grenzgipfel',4618],
[45.92712756883081, 7.876921984235257,'Signalkuppe (Punta Gnifetti)',4554],
[46.093839189553464, 7.858928716434883,'Dom',4545],
[46.107109586833495, 7.711724522200983,'Grand Gendarme',4331],
[45.919638502715564, 7.8711910872756405,'Parrotspitze',4432],
[46.093839189553464, 7.858928716434883,'Mischabel',4545],
[46.03426257063022, 7.61204033560156,'Dent Blanche',4357]]

The first step is we have to convert this list to a columns based approach using dictionary, e.g.


    data_as_dict = {"Longitude": [list of longitudes],
                    "Latitude":  [list of latitudes],
                    "Name": [list of names],
                    "Elevation": [list of elevation values]}
    
    
So lets create an empty dictionary and add data to it:

In [None]:
data_as_dict = {"Longitude": [],
                "Latitude": [],
                "Name": [],
                "Elevation": []}

for row in data:
    data_as_dict["Latitude"].append(row[0])
    data_as_dict["Longitude"].append(row[1])
    data_as_dict["Name"].append(row[2])
    data_as_dict["Elevation"].append(row[3])

Now we do have a dictionary with data in "columns":

In [None]:
print(data_as_dict)

Now we can create a pandas (not geopandas) dataframe out of it:

In [None]:
import pandas as pd

In [None]:
df = pd.DataFrame(data_as_dict)

In [None]:
df.head(12)

Isn't that great? The only problem is we don't have a GeoPandas dataframe. To do that we have to know about the Python Module Shapely (which will be introduced in Lecture 11). Shapely comes with Point, LineString, Polygon etc. classes where we can convert the geometry to it.

In [None]:
from shapely import Point

In [None]:
data_as_dict_geo = {"Name": [],
                "Elevation": [],
                "geometry": []}

for row in data:
    p = Point(row[1],row[0]) # first longitude, then latitude
    data_as_dict_geo["geometry"].append(p)
    data_as_dict_geo["Name"].append(row[2])
    data_as_dict_geo["Elevation"].append(row[3])

In [None]:
print(data_as_dict_geo)

Now we can create a GeoDataFrame out of it. The option "crs" specifies the coordinate reference system which is WGS84. (We will learn about that in the next lecture.)

In [None]:
gdf = gpd.GeoDataFrame(data_as_dict_geo, crs="EPSG:4326")

In [None]:
gdf.head()

Now we can export the GeoDataframe. We can use any format supported for writing from fiona.

    import fiona
    
    fiona.supported_drivers



In [None]:
gdf.to_file("geodata/mountainpeaks.shp", driver="ESRI Shapefile")

now we could open it using QGIS...

## Example 3: USGS Earthquake Data

The United States Geological Survey (USGS) provides earthquake data in GeoJSON format through their Earthquake Hazards Program. This data contains information about recent and historical earthquakes, including their locations, magnitudes, depths, and other relevant attributes.

The USGS earthquake data is available through their API, which allows you to query and filter the data based on various criteria, such as time range, magnitude range, and geographic region.

In [None]:
import requests

url = "https://earthquake.usgs.gov/earthquakes/feed/v1.0/summary/2.5_week.geojson"
#url = "https://earthquake.usgs.gov/earthquakes/feed/v1.0/summary/significant_month.geojson"
#url = "https://earthquake.usgs.gov/earthquakes/feed/v1.0/summary/2.5_month.geojson"

data = requests.get(url)
file = open("geodata/earthquakes.geojson","wb")
file.write(data.content)
file.close()

In [None]:
quakes = gpd.read_file("geodata/earthquakes.geojson")
quakes.head(3)

In [None]:
quakes = quakes[["time","mag", "place","geometry"]].copy()
quakes.head()

In [None]:
quakes.mag.hist(bins=16);

In [None]:
from datetime import datetime, timezone
datetime.fromtimestamp(1602758052977/1000, timezone.utc) # time in **seconds** since 1.1.1970

In [None]:
from datetime import datetime, timezone

data = []
for row in range(0,len(quakes)):
    time = quakes.iloc[row].time
    t = str(datetime.fromtimestamp(time/1000.0, timezone.utc))
    data.append(t)


In [None]:
quakes["time_utc"] = data
quakes.head()

In [None]:
quakes = quakes.drop(["time"], axis=1)

In [None]:
quakes.head()

In [None]:
quakes2 = quakes[["time_utc", "place", "mag", "geometry"]].copy()
quakes2.head()

In [None]:
quakes.sort_values(["mag"], ascending=False).head()

In [None]:
quakes.plot();

It would be nice to draw earthquakes on a map. So let's look at that next.

In [None]:
gdfCountries = gpd.read_file("geodata/packages/natural_earth_vector.gpkg", 
                              layer="ne_110m_admin_0_countries", 
                              encoding="utf-8")

In [None]:
gdfCountries.head()

In [None]:
ax = gdfCountries.plot(figsize=(20,10), facecolor="#BBFFBB", edgecolor="#000000")
quakes.plot(ax=ax, color="#005500", markersize=40);