# Day 3 Talks

May 1, 2022

## Who Said Wrangling Geospatial Data at Scale was Easy?

presenter: Brendan Collins  
link: https://us.pycon.org/2022/schedule/presentation/85/

> If you have ever worked with Census Data, you may be recalling nightmares of hours spent staring at data and finding it impossible to download, store, or convert to a sensible format to begin your analysis.
> And Census data is not even unstructured data!
> 
> Geospatial Data comes in various formats - GeoJSON, Parquet, Shapefiles, GeoTIFF, etc.
> But what are the most efficient ways to convert the data into formats that are easy to understand, work with, transfer, and ultimately analyze?
> Then throw in petabytes worth of data and you hit the challenge of wrangling geospatial data at scale.
> 
> This talk will walk through some of the best ways to handle geospatial data at scale, with a focus on:
>
> The xarray-spatial library for raster-based spatial analysis.
> The RTXpy for GPU-powered spatial analysis.
> Microsoft Planetary Computer examples of geospatial data processing.

- many different formats for geospatial data
    - multiple standards for different types of data
- "vector" in geospatial: points, lines, and polygons
    - discrete data
- heavy use of Parquet formats
    - binary (instead of text based)
    - supports many compression formats
    - column format
    - can be partitioned
    - fast IO and allows easy scaling
- pandas and GeoPandas
    - geopandas: similar API to pandas for geospatial data
    - dask-geopandas: dask abstractions for geopandas
- raster data
    - represent continuous measurements
    - NumPy is essential for working with raster data
    - xarray to replace NumPy with labeled axes
        - xarray-spatial: spatial extension for xarray
    - datashader: library for quickly rasterizing vectors and working jointly with vector and raster data
- scaling:
    - heavy use of Numba to make current Python code faster
    - Dask important for scaling across multiple CPUs
        - understands Numba functions
    - CuPy: NumPy-like interface for working with arrays on Nvidia GPU
    

## Productionize Research Oriented Code By Python

presenter: Tetsuya Jesse Hirata  
link: https://us.pycon.org/2022/schedule/presentation/77/

> Target audiences might be python engineers who is involved with R&D, data science, AI/ML projects, or data oriented projects.
> 
> **Introduction**
> - Background
> - Definition of Research Oriented Code and Production Code
> - Differences between Research Oriented Code and Production Code
> 
> **Main Talk**
> 
> Four steps to productionize research oriented code
> 1. Understand the code through code reading and code documentation
> 2. Modularize the code into preparation code, pre/post-process code, calculation code based on the code documents
> 3. Refactor the code with test code
> 4. Make them products
>
> **Summary**
> - Summarize the four steps to productionize research oriented code
> - After making the code products, improve performance and monitor the behaviors of production code

## Building a Python Code Completer

presenter: Meredydd Luff  
link: https://us.pycon.org/2022/schedule/presentation/129/

> Code completion is almost magic, and it makes writing code feel so good.
> But how does it actually work?
> I built a code completion engine from scratch – and in this talk, I'll tell you its secrets.
> 
> We'll learn how Python parses and compiles code, what an AST is, and how we can use this knowledge to work out what a programmer might type next.
> And to prove it's not that complicated, I'll build a little code completer, live on stage, in about five minutes.
>
> I'll also talk about how code completion is like games programming, how we should broaden our thinking about "types" in Python, and how we can use information that isn't in your code to make coding even more satisfying.

- uise the built-in `ast` module to parse the code
    - insert a known token in place of the current position of the cursor
    - search through the AST until finding the known cursor token
    - now know the existing context of the current state of the program

In [1]:
import ast

tree = ast.parse("x = 2 + 5")
print(ast.dump(tree, indent=2))

Module(
  body=[
    Assign(
      targets=[
        Name(id='x', ctx=Store())],
      value=BinOp(
        left=Constant(value=2),
        op=Add(),
        right=Constant(value=5)))],
  type_ignores=[])


---

In [2]:
%load_ext watermark
%watermark -d -u -v -iv -b -h -m

Last updated: 2022-05-01

Python implementation: CPython
Python version       : 3.10.4
IPython version      : 8.2.0

Compiler    : Clang 12.0.1 
OS          : Darwin
Release     : 21.4.0
Machine     : x86_64
Processor   : i386
CPU cores   : 4
Architecture: 64bit

Hostname: JHCookMac.local

Git branch: pycon2022

