# 2 - Python Support Engineer Test - Sample Creation

## Import Packages

All the packages I used throughout this process are placed here.

In [1]:
import geopandas as gpd
import pandas as pd
from geopandas import points_from_xy
import cartoframes
from cartoframes.auth import set_default_credentials
from cartoframes.viz import Map, Layer, basemaps, size_continuous_style, size_continuous_legend, popup_element
from cartoframes import to_carto

To avoid having to put in the entire filepath of my various files repetitively, I define a static filepath with a wildcard which then allows me to just put the filename. All of the files used for the project are located in the same file, making this seamless.

In [2]:
path = r"/Users/x/Documents/GitHub/CARTO-skills-test/Python-Support-Engineer-Test/{}".format

## Authentication

Use the `set_default_credentials` function and define my filepath and the json that includes my CARTO credentials (username and API key).

In [3]:
set_default_credentials(path('carto_api_creds.json'))

## Data Management

Using geopandas, I read an entire geojson containing NY school districts. I decided to sort the values by shape area to show to the largest school districts. Then I create a new dataframe containing the first 10 features in the geojson.

In [4]:
nysd = gpd.read_file(path('NYC School District Boundaries.geojson'))
nysd_sorted = nysd.sort_values('shape_area')
nysd_df = nysd_sorted.head(10)
nysd_df

Unnamed: 0,schooldist,shape_area,shape_leng,geometry
0,13,104887072.488,86613.4312514,"MULTIPOLYGON (((-73.97906 40.70595, -73.97924 ..."
25,3,113488436.929,52071.9902136,"MULTIPOLYGON (((-73.95672 40.78660, -73.95717 ..."
10,17,128439616.383,68280.3996219,"MULTIPOLYGON (((-73.92044 40.66563, -73.92061 ..."
28,14,150303501.172,95632.1353461,"MULTIPOLYGON (((-73.95440 40.73911, -73.95428 ..."
14,31,1604376900.74,434498.148932,"MULTIPOLYGON (((-74.05051 40.56642, -74.05047 ..."
8,18,175112135.368,121124.870704,"MULTIPOLYGON (((-73.86706 40.58209, -73.86769 ..."
5,15,196149368.081,153416.270604,"MULTIPOLYGON (((-73.98633 40.69105, -73.98536 ..."
17,19,203413267.971,184093.704175,"MULTIPOLYGON (((-73.84674 40.60485, -73.84672 ..."
26,21,210148762.165,123800.098064,"MULTIPOLYGON (((-73.96185 40.62757, -73.96139 ..."
29,20,242722878.703,95513.4821874,"MULTIPOLYGON (((-74.02553 40.65148, -74.02491 ..."


Using pandas, I read an entire csv containing GED Plus Schools and display the dataframe.

In [5]:
ged_df = pd.read_csv(path('GED_Plus_Locations.csv'))
ged_df

Unnamed: 0,Program Site name,Address,Borough,Contact Number,Notes,Postcode,Latitude,Longitude,Community Board,Council District,Census Tract,BIN,BBL,NTA
0,Bronx GED Plus Hub/Referral Center at Bronx Re...,1010 Rev. James Polite Avenue,Bronx,718-842-9200,,10459.0,40.823532,-73.898872,2.0,17.0,12901.0,2005366.0,2.026980e+09,Longwood ...
1,GED Plus at Bronx Community College Future Now,2155 University Avenue,Bronx,718-289-5852,,10453.0,40.857899,-73.909307,7.0,14.0,255.0,2014731.0,2.032170e+09,Kingsbridge Heights ...
2,GED Plus at Bronx VA Medical Center,"130 West Kingsbridge Road Bronx,",Bronx,"718-584-9000, ext. 5059",,10468.0,40.869073,-73.903069,7.0,14.0,261.0,2095229.0,2.032260e+09,Kingsbridge Heights ...
3,GED Plus at Davidson Avenue,1732 Davidson Avenue,Bronx,718-299-5926,,10453.0,40.847798,-73.913419,5.0,14.0,217.0,2008362.0,2.028610e+09,University Heights-Morris Heights ...
4,GED Plus at DeWitt Clinton High School,100 West Mosholu Parkway South,Bronx,718-543-1000,"x ELL Services: ESL, Bilingual Program: Spanish",10468.0,40.882178,-73.886910,7.0,11.0,409.0,2095215.0,2.032510e+09,Van Cortlandt Village ...
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
58,GED Plus Evening at John Adams High School,101-01 Rockaway Boulevard,Queens,718-557-2590,,11417.0,40.679639,-73.837298,10.0,32.0,864.0,4202849.0,4.095400e+09,South Ozone Park ...
59,GED Plus at St. George Referral Center/Hub,450 St. Marks Place,Staten Island,718-273-3225,,10301.0,40.638946,-74.077273,1.0,49.0,3.0,5000161.0,5.000160e+09,West New Brighton-New Brighton-St. George ...
60,GED Plus at Port Richmond High School,85 St. Joseph's Avenue,Staten Island,"718-420-2100, ext. 8472",,10302.0,40.634573,-74.141818,1.0,49.0,213.0,5026076.0,5.011210e+09,Port Richmond ...
61,"GED Plus at Sanitation, Staten Island",66 Swan Street,Staten Island,718-442-0071,,10301.0,40.634782,-74.077312,1.0,49.0,21.0,5013317.0,5.005040e+09,Stapleton-Rosebank ...


I want to convert my pandas dataframe to a geopandas dataframe. So convert the Latitude and Longitude columns to a geometry object and pass the pandas dataframe resulting in a new geodataframe. I print the geodataframe to check if the new geometry column and objects were created correctly.

In [6]:
all_ged_points = gpd.GeoDataFrame(ged_df, geometry=points_from_xy(ged_df['Longitude'], ged_df['Latitude']))
all_ged_points

Unnamed: 0,Program Site name,Address,Borough,Contact Number,Notes,Postcode,Latitude,Longitude,Community Board,Council District,Census Tract,BIN,BBL,NTA,geometry
0,Bronx GED Plus Hub/Referral Center at Bronx Re...,1010 Rev. James Polite Avenue,Bronx,718-842-9200,,10459.0,40.823532,-73.898872,2.0,17.0,12901.0,2005366.0,2.026980e+09,Longwood ...,POINT (-73.89887 40.82353)
1,GED Plus at Bronx Community College Future Now,2155 University Avenue,Bronx,718-289-5852,,10453.0,40.857899,-73.909307,7.0,14.0,255.0,2014731.0,2.032170e+09,Kingsbridge Heights ...,POINT (-73.90931 40.85790)
2,GED Plus at Bronx VA Medical Center,"130 West Kingsbridge Road Bronx,",Bronx,"718-584-9000, ext. 5059",,10468.0,40.869073,-73.903069,7.0,14.0,261.0,2095229.0,2.032260e+09,Kingsbridge Heights ...,POINT (-73.90307 40.86907)
3,GED Plus at Davidson Avenue,1732 Davidson Avenue,Bronx,718-299-5926,,10453.0,40.847798,-73.913419,5.0,14.0,217.0,2008362.0,2.028610e+09,University Heights-Morris Heights ...,POINT (-73.91342 40.84780)
4,GED Plus at DeWitt Clinton High School,100 West Mosholu Parkway South,Bronx,718-543-1000,"x ELL Services: ESL, Bilingual Program: Spanish",10468.0,40.882178,-73.886910,7.0,11.0,409.0,2095215.0,2.032510e+09,Van Cortlandt Village ...,POINT (-73.88691 40.88218)
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
58,GED Plus Evening at John Adams High School,101-01 Rockaway Boulevard,Queens,718-557-2590,,11417.0,40.679639,-73.837298,10.0,32.0,864.0,4202849.0,4.095400e+09,South Ozone Park ...,POINT (-73.83730 40.67964)
59,GED Plus at St. George Referral Center/Hub,450 St. Marks Place,Staten Island,718-273-3225,,10301.0,40.638946,-74.077273,1.0,49.0,3.0,5000161.0,5.000160e+09,West New Brighton-New Brighton-St. George ...,POINT (-74.07727 40.63895)
60,GED Plus at Port Richmond High School,85 St. Joseph's Avenue,Staten Island,"718-420-2100, ext. 8472",,10302.0,40.634573,-74.141818,1.0,49.0,213.0,5026076.0,5.011210e+09,Port Richmond ...,POINT (-74.14182 40.63457)
61,"GED Plus at Sanitation, Staten Island",66 Swan Street,Staten Island,718-442-0071,,10301.0,40.634782,-74.077312,1.0,49.0,21.0,5013317.0,5.005040e+09,Stapleton-Rosebank ...,POINT (-74.07731 40.63478)


Here I filter my dataset by choosing the specific Borough I want. I chose Brooklyn. Then I create a new geodataframe with the first 10 features/points in the Brooklyn Borough.

In [7]:
brooklyn_ged_points = all_ged_points.query('Borough == "Brooklyn"')
ged_points = brooklyn_ged_points.head(10)

NumExpr defaulting to 8 threads.


To map my polygons and points, I use the `Map` and `Layer` functions from the Cartoframes package to create a Map and to plot my two layers.

Now I have a Map that displays the first 10 GED Plus Schools in Brooklyn and 10 of the largest school districts in NYC!

In [8]:
nysd_map = Map([
    Layer(nysd_df,
        legends=size_continuous_legend(
        title='10 Largest School Districts in NYC',
        description='First 10 GED Plus Schools in Brooklyn'),
        popup_hover= [
        popup_element('schooldist', 'School District')]),
    Layer(ged_points,
          popup_hover= [
          popup_element('Program Site name', 'School Name')])

],
    basemaps.voyager,
    theme = 'dark')
nysd_map

## Upload Data to CARTO

To upload my two tables containing 10 rows each, I use the `to_carto` function and specify the source dataframes and the name that will appear in my carto datasets.

In [9]:
to_carto(nysd_df, 'ten_ny_school_districts', if_exists='replace')
to_carto(ged_points, 'ten_brooklyn_ged_locations', if_exists='replace')

Success! Data uploaded to table "ten_ny_school_districts" correctly
Success! Data uploaded to table "ten_brooklyn_ged_locations" correctly


'ten_brooklyn_ged_locations'