GitHub - harrydurbin/tweet-data: Data analysis for one million geo-tagged spatial points

Data analysis for one million geo-tagged spatial points

Cluster with Minibatch Kmeans
Sub-cluster with DBSCAN
Write data to PSQL/PostGIS database
Import county shapefile and census population data
Query tweets per capita for each county

See iPython Notebooke code here.

Figure 1: Geo-tagged tweets in bay area (color per cluster)

Figure 2: Final cluster boundaries after running DBSCAN

Figure 3: Chloropleth indicating tweets per capita for each county

For clustering, a hierarchical approach was used to overcome memory limitations. One million datapoints is too much for DBSCAN, so first minibatch Kmeans was used to quickly breakdown data into a minimal amount of cluster groups. The minibatch Kmeans cluster method uses a pre-determined number of clusters and iteratively sets a centroid until it converging into equal density clusters. Alternatively, the DBSCAN cluster method uses a stipulated distance (epsilon) to connect datapoints and establish clusters. Clusters containing less than 1,000 tweets were considered insignificant and eliminated.

Name		Name	Last commit message	Last commit date
Latest commit History 16 Commits
img		img
tl_2010_06_county10		tl_2010_06_county10
.gitignore		.gitignore
all_050_in_06.P1.csv		all_050_in_06.P1.csv
readme.md		readme.md
requirements.txt		requirements.txt
tweets.ipynb		tweets.ipynb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

img

img

tl_2010_06_county10

tl_2010_06_county10

.gitignore

.gitignore

all_050_in_06.P1.csv

all_050_in_06.P1.csv

readme.md

readme.md

requirements.txt

requirements.txt

tweets.ipynb

tweets.ipynb

Repository files navigation

Data analysis for one million geo-tagged spatial points

About

Releases

Packages

Languages

harrydurbin/tweet-data

Folders and files

Latest commit

History

Repository files navigation

Data analysis for one million geo-tagged spatial points

About

Resources

Stars

Watchers

Forks

Languages