## What are Shape Files

The shapefile format is a geospatial vector data format for geographic information system (GIS) software. It is developed and regulated by Esri as a mostly open specification for data interoperability among Esri and other GIS software products. The shapefile format can spatially describe vector features: points, lines, and polygons, representing, for example, water wells, rivers, and lakes. Each item usually has attributes that describe it, such as name or temperature.

## What are GeoJSON Files

GeoJSON is a format for encoding a variety of geographic data structures. GeoJSON is an open standard format designed for representing simple geographical features, along with their non-spatial attributes. It is based on the JSON format.

The features include points (therefore addresses and locations), line strings (therefore streets, highways and boundaries), polygons (countries, provinces, tracts of land), and multi-part collections of these types. GeoJSON features need not represent entities of the physical world only; mobile routing and navigation apps, for example, might describe their service coverage using GeoJSON

![GeoJson](geojson1.png)

## What is GeoPandas

GeoPandas is an open source project to make working with geospatial data in python easier. GeoPandas extends the datatypes used by pandas to allow spatial operations on geometric types. Geometric operations are performed by shapely. Geopandas further depends on fiona for file access and matplotlib for plotting.



GeoPandas is an open source project to add support for geographic data to pandas objects. It currently implements GeoSeries and GeoDataFrame types which are subclasses of pandas.Series and pandas.DataFrame respectively. GeoPandas objects can act on shapely geometry objects and perform geometric operations.

In [1]:
import geopandas as gpd

In [2]:
geo_df = gpd.read_file(
    "../00_source_data/HIFLD_CollegeUniversityCampuses/"
    "HIFLD_CollegeUniversityCampuses.shp"
)

In [3]:
data=geo_df[['UNIQUEID', 'NAME', 'ADDRESS', 'CITY', 'STATE', 'ZIP', 'ZIP4',
       'TELEPHONE', 'POPULATION', 'COUNTY', 'COUNTYFIPS','COUNTRY', 'TOT_ENROLL','geometry']]
data.head()

Unnamed: 0,UNIQUEID,NAME,ADDRESS,CITY,STATE,ZIP,ZIP4,TELEPHONE,POPULATION,COUNTY,COUNTYFIPS,COUNTRY,TOT_ENROLL,geometry
0,105367,OTTAWA UNIVERSITY-PHOENIX,9414 NORTH 25TH AVENUE,PHOENIX,AZ,85021,NOT AVAILABLE,(602) 371-1188,923,MARICOPA,4013,USA,809,"POLYGON ((-12480419.928 3971527.354, -12480311..."
1,110361,CALIFORNIA BAPTIST UNIVERSITY,8432 MAGNOLIA AVE,RIVERSIDE,CA,92504,3297,(877) 228-3615,12142,RIVERSIDE,6065,USA,10486,"POLYGON ((-13072418.831 4019240.575, -13071986..."
2,111391,CALIFORNIA WESTERN SCHOOL OF LAW,225 CEDAR ST,SAN DIEGO,CA,92101,3046,(619) 239-0391,1021,SAN DIEGO,6073,USA,802,"POLYGON ((-13042517.805 3858457.172, -13042435..."
3,113698,DOMINICAN UNIVERSITY OF CALIFORNIA,50 ACACIA AVE,SAN RAFAEL,CA,94901,2298,(415) 457-4440,2395,MARIN,6041,USA,1779,"MULTIPOLYGON (((-13637570.792 4576797.967, -13..."
4,192563,LIU BRENTWOOD,"MICHAEL J. GRANT CAMPUS, SUFFOLK COUNTY COMMUN...",BRENTWOOD,NY,11717,NOT AVAILABLE,(631) 287-8500,-999,SUFFOLK,36103,USA,-999,"POLYGON ((-8156290.559 4982259.419, -8156297.2..."


In [5]:
type(data)

geopandas.geodataframe.GeoDataFrame

In [4]:
data['geometry'].centroid.head()

0    POINT (-12480368.112 3971492.550)
1    POINT (-13071805.749 4019121.787)
2    POINT (-13042476.335 3858426.366)
3    POINT (-13638208.049 4576613.670)
4     POINT (-8156683.197 4982572.374)
dtype: geometry

### Geometry Columns
1. Point <br>
2. Polygon <br>
3. Multipolygon


![Geometry Column](geometry.png)

### Converting Pandas Dataframe into GeoPandas Dataframe

In [6]:
import pandas as pd
from shapely.geometry import Point
df = pd.DataFrame(
    {'City': ['Buenos Aires', 'Brasilia', 'Santiago', 'Bogota', 'Caracas'],
     'Country': ['Argentina', 'Brazil', 'Chile', 'Colombia', 'Venezuela'],
     'Latitude': [-34.58, -15.78, -33.45, 4.60, 10.48],
     'Longitude': [-58.66, -47.91, -70.66, -74.08, -66.86]})

df['Coordinates'] = list(zip(df.Longitude, df.Latitude))
df['Coordinates'] = df['Coordinates'].apply(Point)
print(df.head())
print(type(df))

           City    Country  Latitude  Longitude            Coordinates
0  Buenos Aires  Argentina    -34.58     -58.66  POINT (-58.66 -34.58)
1      Brasilia     Brazil    -15.78     -47.91  POINT (-47.91 -15.78)
2      Santiago      Chile    -33.45     -70.66  POINT (-70.66 -33.45)
3        Bogota   Colombia      4.60     -74.08     POINT (-74.08 4.6)
4       Caracas  Venezuela     10.48     -66.86   POINT (-66.86 10.48)
<class 'pandas.core.frame.DataFrame'>


In [7]:
df.Coordinates.x

AttributeError: 'Series' object has no attribute 'x'

In [8]:
gdf = gpd.GeoDataFrame(df, geometry='Coordinates')
print(gdf)
print(type(gdf))

           City    Country  Latitude  Longitude                  Coordinates
0  Buenos Aires  Argentina    -34.58     -58.66  POINT (-58.66000 -34.58000)
1      Brasilia     Brazil    -15.78     -47.91  POINT (-47.91000 -15.78000)
2      Santiago      Chile    -33.45     -70.66  POINT (-70.66000 -33.45000)
3        Bogota   Colombia      4.60     -74.08    POINT (-74.08000 4.60000)
4       Caracas  Venezuela     10.48     -66.86   POINT (-66.86000 10.48000)
<class 'geopandas.geodataframe.GeoDataFrame'>


In [10]:
gdf.Coordinates.y

0   -34.58
1   -15.78
2   -33.45
3     4.60
4    10.48
dtype: float64