# Getting Started with geosnap  
  
  
  
  

The geosnap package is designed for researchers in the Urban Studies who are interested in exploring, modeling, and analyzing the social and spatial dynamics of neighborhoods. Although neighborhoods are critically important for human development and public policy, they present a variety of novel challenges for quantitative researchers. Since there is no accepted definition of "neighborhood," most quantitative studies involving [neighborhood effects](https://www.annualreviews.org/doi/10.1146/annurev.soc.28.110601.141114) or [neighborhood dynamics](https://www.sciencedirect.com/science/article/pii/S0094119000921818) use census data and their administrative boundaries to define spatial areas that reasonably approximate neighborhoods. In the U.S., this typically means using census tracts, since they have a relatively small spatial footprint and a wide variety of variables are tabulated at that scale. For this reason, geosnap's first release is targeted at researchers working with US Census tract data. This allows the software to make available a wide variety of data and commonly-used variables with minimal interaction from the end-user. Later releases will expand functionality to other geographies and data sources.  
  
   
   


You can access geosnap's data dictionary to get a sense of the variables available for analysis, the census tabulations from which they are derived, and the nomenclature they use. It's available as a pandas DataFrame under `geosnap.data`. Variables pertaining to counts are prefixed with `n_` whereas percentages are prefixed with `p_`

In [1]:
import geosnap

geosnap.data.dictionary

ModuleNotFoundError: No module named 'geosnap'


  
Unfortunately, US Census tracts have several analytical drawbacks. With each new decennial census, tracts are redrawn according to population fluctuations, which means that *temporal* analyses of neighborhoods pose a particularly acute challenge because the units of analysis are not stable over time. geosnap solves this challenge in two ways  

First, geosnap can simply leverage existing data that has already been standardized into a set of consistent units. Its `data` module provides tools for reading and storing existing longitudinal databases that, once ingested, can be queried and analyzed repeatedly. This is a good option for researchers who want to get started modeling neighborhood characteristics right away and are less interested in exploring how error propagates through spatial interpolation.   

Second, geosnap can create its own set of stable longitudinal units of analysis and convert raw census or other data into those units. Its `harmonize` module provides tools for researchers to define a set of geographic units and interpolate data into those units using moden spatial statistical methods. This is a good option for researchers who are interested in the ways that different interpolation methods can affect their analyses or those who want to use state-of-the-art methods to create longitudinal datasets that are more accurate than those provided by existing databases


---
## Importing Data from External Databases

The quickest way to get started with geosnap is by importing pre-harmonized census data from either 
- the [Longitudinal Tract Database
(LTDB)](https://s4.ad.brown.edu/projects/diversity/Researcher/LTDB.htm) created by researchers from Brown University or 
- the [Neighborhood Change Database](http://www.geolytics.com/USCensus,Neighborhood-Change-Database-1970-2000,Products.asp) created by Geolytics. 


**While licensing restrictions prevent either of these databases from being distributed inside geosnap, LTDB is nonetheless *free*. As such, we recommended importing LTDB data before getting started with geosnap**



### Longitudinal Tract Database (LTDB)

The [Longitudinal Tract Database
(LTDB)](https://s4.ad.brown.edu/projects/diversity/Researcher/LTDB.htm) is a
freely available dataset developed by researchers at Brown University that
provides census data harmonized to 2010 boundaries.

To import LTDB data into geosnap, proceed with the following:

1. Download the raw data from the LTDB [downloads
  page](https://s4.ad.brown.edu/projects/diversity/Researcher/LTBDDload/Default.aspx).
  Note that to construct the entire database you will need two archives: one
  containing the sample variables, and another containing the "full count"
  variables.
    - Use the dropdown menu called **select file type** and choose "full"; in
      the dropdown called **select a year**, choose "All Years"
    - Click the button "Download Standard Data Files"
    - Repeat the process, this time selecting "sample" in the **select file
      type** menu and "All years" in the **select a year** dropdown
2. Note the location of the two zip archives you downloaded. By default they are called 
    - `LTDB_Std_All_Sample.zip` and
    - `LTDB_Std_All_fullcount.zip`

3. Start ipython/jupyter, import geosnap, and call the `store_ltdb` function with the paths of the two zip archives you downloaded from the LTDB project page:


In [1]:

# if the archives were in my downloads folder, the paths might be something like this

sample = "/Users/knaaptime/Downloads/LTDB_Std_All_Sample.zip"
full = "/Users/knaaptime/Downloads/LTDB_Std_All_fullcount.zip"

geosnap.data.store_ltdb(sample=sample, fullcount=full)

  
  
The reader function will extract the necessary data from the archives, calculate some additional variables, and store it as a long-form DataFrame (using the efficient apache parquet format) for later use--meaning you should only need the `store_ltdb` function once. The dataset will be available to geosnap internally when you instantiate a `Community`, but you can also access the complete raw data with `geosnap.data.db.ltdb`

In [4]:
geosnap.data.db.ltdb.head()

Unnamed: 0_level_0,n_asian_under_15,n_black_under_15,n_hispanic_under_15,n_native_under_15,n_white_under_15,n_persons_under_18,n_asian_over_60,n_black_over_60,n_hispanic_over_60,n_native_over_60,...,n_white_persons,year,n_total_housing_units_sample,p_nonhisp_white_persons,p_white_over_60,p_black_over_60,p_hispanic_over_60,p_native_over_60,p_asian_over_60,p_disabled
geoid,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
1001020500,,1.0,,,2.0,3.0,,0.0,,,...,6.0,1970,2.0,,6.0,4.0,,,,5.0
1003010100,,609.0,,,639.0,1407.0,,221.0,,,...,2004.0,1970,1106.0,,8.0,6.0,,,,6.0
1003010200,,38.0,,,564.0,687.0,,28.0,,,...,1758.0,1970,619.0,,13.0,1.0,,,,6.0
1003010300,,375.0,,,982.0,1524.0,,104.0,,,...,2835.0,1970,1026.0,,8.0,3.0,,,,7.0
1003010400,,113.0,,,797.0,1030.0,,37.0,,,...,2323.0,1970,780.0,,11.0,1.0,,,,11.0


---

### Geolytics Neighborhood Change Database

The Neighborhood Change Database (ncdb) is a commercial database created by Geolytics and the Urban Institute. Like LTDB, it provides census data harmonized to 2010 tracts. NCDB data must be purchased from Geolytics prior to use. If you have a license, you can import NCDB into geosnap with the following:

1. Open the Geolytics application
2. Choose "New Request":   
![Choose "New Request"](https://raw.githubusercontent.com/spatialucr/geosnap/master/geosnap/data/geolytics/geolytics_interface1.PNG)
3. Select CSV or DBF
4. Make the following Selections:
    - **year**: all years in 2010 boundaries
    - **area**: all census tracts in the entire united states
    - **counts**: [right click] Check All Sibling Nodes

![](https://raw.githubusercontent.com/spatialucr/geosnap/master/geosnap/data/geolytics/geolytics_interface2.PNG)

5. Click `Run Report`

6. Note the name and location of the CSV you created

7. Start ipython/jupyter, import geosnap, and call the `store_ncdb` function with the path of the CSV:


In [6]:

ncdb_path = "~/Downloads/geolytics_full.csv"

geosnap.data.store_ncdb(ncdb_path)

  if (yield from self.run_code(code, result)):


In [7]:
geosnap.data.db.ncdb.head()

Unnamed: 0_level_0,year,n_mexican_pop,n_cuban_pop,n_puerto_rican_pop,n_foreign_born_pop,n_naturalized_pop,p_foreign_born_pop,n_total_housing_units,n_vacant_housing_units,n_occupied_housing_units,...,p_naturalized_pop,p_vacant_housing_units,p_owner_occupied_units,p_married,p_female_headed_families,p_nonhisp_white_persons,p_employed_professional,p_employed_manufacturing,p_poverty_rate_hispanic,p_poverty_rate_native
geoid,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
1001020100,1980,0,0.0,0,9.0,,1.0,555,43,511,...,,8.0,92.0,inf,inf,91.0,,,0.0,0.0
1001020100,1990,0,0.0,0,0.0,,0.0,697,45,651,...,,6.0,93.0,31.0,10.0,99.0,,,0.0,0.0
1001020100,2000,0,0.0,0,0.0,0.0,0.0,741,81,659,...,0.0,11.0,89.0,32.0,10.0,96.0,,,0.0,0.0
1001020100,2010,30,1.0,2,18.0,0.0,1.0,752,59,693,...,0.0,8.0,92.0,31.0,,84.0,,,,
1001020200,1980,7,0.0,0,22.0,,1.0,741,48,693,...,,6.0,94.0,24.0,14.0,38.0,,,0.0,0.0
