__author__ = 'Ricardo Pasquini'

# Counting tweets at the hexagon level. Complete procedure

Overview: This code counts georeferenced tweets at the hexagon level. Hexagons are uniquely identified in space. We use Uber's H3 library for this purpose.  
The code  also performs a home location prediction for each user. This allows later counting tweets by whether they were tweeted by residents, residents of neighbors hexagons, or non-residents.
The project was designed to be scalable to millions of tweets. The process is done with MongoDB.

By default the project generates the following collections on the Mongo database:

tweets: Collection containing tweets. This collection is modified by incorporating the corresponding hex id. \
users: Collection containing users. This collection is modified by incorporating the home location data.\
hexcounts: Collection containing counts of tweets.


In [1]:
import sys
sys.path.append("../") 
import databasepopulation
import communicationwmongo as commu
import home_location as home
import analysis as a
import pymongo
import pandas as pd
import my_h3_functions as myh3

In [2]:
import importlib
importlib.reload(databasepopulation)
importlib.reload(home)
importlib.reload(a)
importlib.reload(myh3)

<module 'my_h3_functions' from '..\\my_h3_functions.py'>

# Counting tweets at the hexagon level. Complete procedure

This version of the code assumes that a different database will be used for each city.


If necessary check your current location with the following command:

In [3]:
pwd

'C:\\Users\\emman\\Documents\\git\\twitter_and_displacement\\notebooks'

# 0. Connect to Mongo and define a specific database

By default connect to Mongo running on a localhost. 
Requires manually changing the parameters if working on a cloud environment

I will use the name buenosaires as the db example. Recall each database for each city.


In [4]:
db=commu.connecttoLocaldb(database='twitter_bog')

In [5]:
db.Twitter_Data

Collection(Database(MongoClient(host=['localhost:27017'], document_class=dict, tz_aware=False, connect=True), 'twitter_bog'), 'Twitter_Data')

# 1. Raw data to Mongo
This function populates the files containing tweets into Mongo.
Assumes files are named like: ba_2012.csv, etc.
Choose start and end years.


In [None]:
databasepopulation.populatetweets(db, path='/Users/emman/Box/Twitter data/Latin America/Bogota/', cityprefix='bo', yearstart=2012, yearend=2012)

In [None]:
databasepopulation.populatetweets(db, path='/Users/emman/Box/Twitter data/Latin America/Bogota/', cityprefix='bo', yearstart=2015, yearend=2015)

In [None]:
db.tweets.update({},{'$rename':{"latitude":"lat","longitude":"lon"}}, False,True,True)

# 2. Adding Hexids to tweets 
This function includes hexagons ids to each tweet.  

In [6]:
databasepopulation.addhexjob(db)

 iter: 1  time: 0.4697873592376709
 iter: 2  time: 0.40143537521362305
 iter: 3  time: 0.2792544364929199
 iter: 4  time: 0.23536944389343262
 iter: 5  time: 0.18849754333496094
 iter: 6  time: 0.17817115783691406
 iter: 7  time: 0.2293870449066162
 iter: 8  time: 0.27100467681884766
 iter: 9  time: 0.3171510696411133
 iter: 10  time: 0.3171520233154297
 iter: 11  time: 0.36801815032958984
 iter: 12  time: 0.19148731231689453
 iter: 13  time: 0.1894056797027588
 iter: 14  time: 0.22539830207824707
 iter: 15  time: 0.2593064308166504
 iter: 16  time: 0.2822449207305908
 iter: 17  time: 0.22403979301452637
 iter: 18  time: 0.19946622848510742
 iter: 19  time: 0.19466423988342285
 iter: 20  time: 0.2143995761871338
 iter: 21  time: 0.3690149784088135
 iter: 22  time: 0.28124427795410156
 iter: 23  time: 0.24534392356872559
 iter: 24  time: 0.2872316837310791
 iter: 25  time: 0.2794802188873291
 iter: 26  time: 0.19682788848876953
 iter: 27  time: 0.20345544815063477
 iter: 28  time: 0.167

 iter: 223  time: 0.1625652313232422
 iter: 224  time: 0.17253899574279785
 iter: 225  time: 0.19148850440979004
 iter: 226  time: 0.17417478561401367
 iter: 227  time: 0.16057038307189941
 iter: 228  time: 0.1629786491394043
 iter: 229  time: 0.16251540184020996
 iter: 230  time: 0.16638684272766113
 iter: 231  time: 0.17453312873840332
 iter: 232  time: 0.1934833526611328
 iter: 233  time: 0.17375397682189941
 iter: 234  time: 0.17304563522338867
 iter: 235  time: 0.16848182678222656
 iter: 236  time: 0.16655445098876953
 iter: 237  time: 0.1685495376586914
 iter: 238  time: 0.17553019523620605
 iter: 239  time: 0.21058869361877441
 iter: 240  time: 0.16356539726257324
 iter: 241  time: 0.16057085990905762
 iter: 242  time: 0.16564249992370605
 iter: 243  time: 0.16451668739318848
 iter: 244  time: 0.17955636978149414
 iter: 245  time: 0.19146132469177246
 iter: 246  time: 0.17600798606872559
 iter: 247  time: 0.15513372421264648
 iter: 248  time: 0.17465972900390625
 iter: 249  time

 iter: 442  time: 0.17003488540649414
 iter: 443  time: 0.21796631813049316
 iter: 444  time: 0.18052411079406738
 iter: 445  time: 0.16599512100219727
 iter: 446  time: 0.16176915168762207
 iter: 447  time: 0.16409564018249512
 iter: 448  time: 0.15719246864318848
 iter: 449  time: 0.16830873489379883
 iter: 450  time: 0.1988532543182373
 iter: 451  time: 0.16360807418823242
 iter: 452  time: 0.16077566146850586
 iter: 453  time: 0.16661572456359863
 iter: 454  time: 0.1924905776977539
 iter: 455  time: 0.16111207008361816
 iter: 456  time: 0.17760324478149414
 iter: 457  time: 0.15610623359680176
 iter: 458  time: 0.15386033058166504
 iter: 459  time: 0.16626286506652832
 iter: 460  time: 0.154524564743042
 iter: 461  time: 0.1643514633178711
 iter: 462  time: 0.1636962890625
 iter: 463  time: 0.18351507186889648
 iter: 464  time: 0.16443681716918945
 iter: 465  time: 0.1525886058807373
 iter: 466  time: 0.15865349769592285
 iter: 467  time: 0.16678857803344727
 iter: 468  time: 0.15

 iter: 661  time: 0.19348526000976562
 iter: 662  time: 0.2040238380432129
 iter: 663  time: 0.23835325241088867
 iter: 664  time: 0.17354249954223633
 iter: 665  time: 0.17411255836486816
 iter: 666  time: 0.17752599716186523
 iter: 667  time: 0.16957950592041016
 iter: 668  time: 0.18750476837158203
 iter: 669  time: 0.17054319381713867
 iter: 670  time: 0.15298008918762207
 iter: 671  time: 0.18951749801635742
 iter: 672  time: 0.15860342979431152
 iter: 673  time: 0.17694973945617676
 iter: 674  time: 0.1829395294189453
 iter: 675  time: 0.2538292407989502
 iter: 676  time: 0.1999197006225586
 iter: 677  time: 0.17054343223571777
 iter: 678  time: 0.15658116340637207
 iter: 679  time: 0.15458941459655762
 iter: 680  time: 0.15262556076049805
 iter: 681  time: 0.18605303764343262
 iter: 682  time: 0.15189695358276367
 iter: 683  time: 0.14860057830810547
 iter: 684  time: 0.14961552619934082
 iter: 685  time: 0.16057324409484863
 iter: 686  time: 0.16057038307189941
 iter: 687  time

 iter: 879  time: 0.22144412994384766
 iter: 880  time: 0.18639421463012695
 iter: 881  time: 0.18105578422546387
 iter: 882  time: 0.16228437423706055
 iter: 883  time: 0.16318726539611816
 iter: 884  time: 0.17994403839111328
 iter: 885  time: 0.20323872566223145
 iter: 886  time: 0.16614508628845215
 iter: 887  time: 0.1556391716003418
 iter: 888  time: 0.15603256225585938
 iter: 889  time: 0.1586165428161621
 iter: 890  time: 0.1545858383178711
 iter: 891  time: 0.15757012367248535
 iter: 892  time: 0.17706894874572754
 iter: 893  time: 0.14966988563537598
 iter: 894  time: 0.15685415267944336
 iter: 895  time: 0.15658354759216309
 iter: 896  time: 0.15459108352661133
 iter: 897  time: 0.16178417205810547
 iter: 898  time: 0.15731191635131836
 iter: 899  time: 0.17841124534606934
 iter: 900  time: 0.1800692081451416
 iter: 901  time: 0.16057538986206055
 iter: 902  time: 0.16455435752868652
 iter: 903  time: 0.1555941104888916
 iter: 904  time: 0.15148401260375977
 iter: 905  time:

 iter: 1094  time: 0.16057682037353516
 iter: 1095  time: 0.15849733352661133
 iter: 1096  time: 0.15259265899658203
 iter: 1097  time: 0.1752607822418213
 iter: 1098  time: 0.1706559658050537
 iter: 1099  time: 0.15474367141723633
 iter: 1100  time: 0.15740156173706055
 iter: 1101  time: 0.1629011631011963
 iter: 1102  time: 0.15856146812438965
 iter: 1103  time: 0.17916297912597656
 iter: 1104  time: 0.17568635940551758
 iter: 1105  time: 0.15682530403137207
 iter: 1106  time: 0.1596207618713379
 iter: 1107  time: 0.17595195770263672
 iter: 1108  time: 0.16475701332092285
 iter: 1109  time: 0.1680302619934082
 iter: 1110  time: 0.18765854835510254
 iter: 1111  time: 0.16567158699035645
 iter: 1112  time: 0.1536235809326172
 iter: 1113  time: 0.17327356338500977
 iter: 1114  time: 0.16298508644104004
 iter: 1115  time: 0.15753841400146484
 iter: 1116  time: 0.18251943588256836
 iter: 1117  time: 0.178941011428833
 iter: 1118  time: 0.15299773216247559
 iter: 1119  time: 0.156261682510

 iter: 1307  time: 0.1655745506286621
 iter: 1308  time: 0.18787837028503418
 iter: 1309  time: 0.15561270713806152
 iter: 1310  time: 0.15953564643859863
 iter: 1311  time: 0.16686177253723145
 iter: 1312  time: 0.15136432647705078
 iter: 1313  time: 0.1584300994873047
 iter: 1314  time: 0.191054105758667
 iter: 1315  time: 0.1565413475036621
 iter: 1316  time: 0.158735990524292
 iter: 1317  time: 0.1876521110534668
 iter: 1318  time: 0.17249345779418945
 iter: 1319  time: 0.1699810028076172
 iter: 1320  time: 0.1644577980041504
 iter: 1321  time: 0.21473979949951172
 iter: 1322  time: 0.1552870273590088
 iter: 1323  time: 0.16269183158874512
 iter: 1324  time: 0.18153786659240723
 iter: 1325  time: 0.15574932098388672
 iter: 1326  time: 0.16651391983032227
 iter: 1327  time: 0.16556596755981445
 iter: 1328  time: 0.18455910682678223
 iter: 1329  time: 0.1555798053741455
 iter: 1330  time: 0.1558992862701416
 iter: 1331  time: 0.15651535987854004
 iter: 1332  time: 0.16265130043029785

 iter: 1521  time: 0.20477819442749023
 iter: 1522  time: 0.1875624656677246
 iter: 1523  time: 0.16821670532226562
 iter: 1524  time: 0.17760396003723145
 iter: 1525  time: 0.16494464874267578
 iter: 1526  time: 0.19560885429382324
 iter: 1527  time: 0.17660140991210938
 iter: 1528  time: 0.17635750770568848
 iter: 1529  time: 0.1723625659942627
 iter: 1530  time: 0.17901873588562012
 iter: 1531  time: 0.1801624298095703
 iter: 1532  time: 0.20559072494506836
 iter: 1533  time: 0.17000865936279297
 iter: 1534  time: 0.17577719688415527
 iter: 1535  time: 0.17588257789611816
 iter: 1536  time: 0.1720893383026123
 iter: 1537  time: 0.16746163368225098
 iter: 1538  time: 0.17143678665161133
 iter: 1539  time: 0.20310282707214355
 iter: 1540  time: 0.1773836612701416
 iter: 1541  time: 0.17350172996520996
 iter: 1542  time: 0.18072223663330078
 iter: 1543  time: 0.1751852035522461
 iter: 1544  time: 0.17654085159301758
 iter: 1545  time: 0.20690155029296875
 iter: 1546  time: 0.1724023818

 iter: 1734  time: 0.18662810325622559
 iter: 1735  time: 0.19964075088500977
 iter: 1736  time: 0.18772506713867188
 iter: 1737  time: 0.21623921394348145
 iter: 1738  time: 0.1864762306213379
 iter: 1739  time: 0.183518648147583
 iter: 1740  time: 0.18749690055847168
 iter: 1741  time: 0.19015192985534668
 iter: 1742  time: 0.1984548568725586
 iter: 1743  time: 0.22820115089416504
 iter: 1744  time: 0.20046329498291016
 iter: 1745  time: 0.1928575038909912
 iter: 1746  time: 0.20106792449951172
 iter: 1747  time: 0.19452714920043945
 iter: 1748  time: 0.1830732822418213
 iter: 1749  time: 0.2067244052886963
 iter: 1750  time: 0.2047100067138672
 iter: 1751  time: 0.19899773597717285
 iter: 1752  time: 0.1815330982208252
 iter: 1753  time: 0.18616580963134766
 iter: 1754  time: 0.18807315826416016
 iter: 1755  time: 0.18331122398376465
 iter: 1756  time: 0.18561697006225586
 iter: 1757  time: 0.21502232551574707
 iter: 1758  time: 0.1894364356994629
 iter: 1759  time: 0.18632984161376

 iter: 1947  time: 0.21837592124938965
 iter: 1948  time: 0.23938989639282227
 iter: 1949  time: 0.210951566696167
 iter: 1950  time: 0.21513700485229492
 iter: 1951  time: 0.2097012996673584
 iter: 1952  time: 0.19547748565673828
 iter: 1953  time: 0.21219778060913086
 iter: 1954  time: 0.20348644256591797
 iter: 1955  time: 0.2294151782989502
 iter: 1956  time: 0.21122336387634277
 iter: 1957  time: 0.19502520561218262
 iter: 1958  time: 0.19007086753845215
 iter: 1959  time: 0.18471622467041016
 iter: 1960  time: 0.18854117393493652
 iter: 1961  time: 0.20576167106628418
 iter: 1962  time: 0.18788766860961914
 iter: 1963  time: 0.198516845703125
 iter: 1964  time: 0.18350934982299805
 iter: 1965  time: 0.19452357292175293
 iter: 1966  time: 0.1894841194152832
 iter: 1967  time: 0.18151569366455078
 iter: 1968  time: 0.2087414264678955
 iter: 1969  time: 0.18886375427246094
 iter: 1970  time: 0.18680262565612793
 iter: 1971  time: 0.20785045623779297
 iter: 1972  time: 0.200823545455

 iter: 2161  time: 0.19499731063842773
 iter: 2162  time: 0.17595314979553223
 iter: 2163  time: 0.17904448509216309
 iter: 2164  time: 0.18051719665527344
 iter: 2165  time: 0.2194223403930664
 iter: 2166  time: 0.17353343963623047
 iter: 2167  time: 0.18551039695739746
 iter: 2168  time: 0.21806764602661133
 iter: 2169  time: 0.17915749549865723
 iter: 2170  time: 0.1964716911315918
 iter: 2171  time: 0.19686365127563477
 iter: 2172  time: 0.21740317344665527
 iter: 2173  time: 0.19204282760620117
 iter: 2174  time: 0.2293860912322998
 iter: 2175  time: 0.24318623542785645
 iter: 2176  time: 0.27428627014160156
 iter: 2177  time: 0.3387625217437744
 iter: 2178  time: 0.3316318988800049
 iter: 2179  time: 0.3042140007019043
 iter: 2180  time: 0.3500330448150635
 iter: 2181  time: 0.2989184856414795
 iter: 2182  time: 0.28623437881469727
 iter: 2183  time: 0.2558290958404541
 iter: 2184  time: 0.2593066692352295
 iter: 2185  time: 0.28921079635620117
 iter: 2186  time: 0.24634099006652

 iter: 2375  time: 0.20345306396484375
 iter: 2376  time: 0.24109315872192383
 iter: 2377  time: 0.18435907363891602
 iter: 2378  time: 0.18810772895812988
 iter: 2379  time: 0.19869399070739746
 iter: 2380  time: 0.19864559173583984
 iter: 2381  time: 0.18749332427978516
 iter: 2382  time: 0.18761992454528809
 iter: 2383  time: 0.2355349063873291
 iter: 2384  time: 0.19243979454040527
 iter: 2385  time: 0.1755836009979248
 iter: 2386  time: 0.18366050720214844
 iter: 2387  time: 0.18621015548706055
 iter: 2388  time: 0.18098235130310059
 iter: 2389  time: 0.17087244987487793
 iter: 2390  time: 0.1934833526611328
 iter: 2391  time: 0.16891694068908691
 iter: 2392  time: 0.1725785732269287
 iter: 2393  time: 0.1695537567138672
 iter: 2394  time: 0.1706080436706543
 iter: 2395  time: 0.17658138275146484
 iter: 2396  time: 0.19950628280639648
 iter: 2397  time: 0.17123150825500488
 iter: 2398  time: 0.17350387573242188
 iter: 2399  time: 0.17991423606872559
 iter: 2400  time: 0.3290836811

 iter: 2588  time: 0.17429566383361816
 iter: 2589  time: 0.23935699462890625
 iter: 2590  time: 0.2194509506225586
 iter: 2591  time: 0.21409916877746582
 iter: 2592  time: 0.2014617919921875
 iter: 2593  time: 0.20157122611999512
 iter: 2594  time: 0.22037649154663086
 iter: 2595  time: 0.20197033882141113
 iter: 2596  time: 0.17692780494689941
 iter: 2597  time: 0.17602014541625977
 iter: 2598  time: 0.1790454387664795
 iter: 2599  time: 0.19199037551879883
 iter: 2600  time: 0.16991710662841797
 iter: 2601  time: 0.20018720626831055
 iter: 2602  time: 0.18294548988342285
 iter: 2603  time: 0.18052029609680176
 iter: 2604  time: 0.17899727821350098
 iter: 2605  time: 0.17380261421203613
 iter: 2606  time: 0.17397165298461914
 iter: 2607  time: 0.2005317211151123
 iter: 2608  time: 0.1770613193511963
 iter: 2609  time: 0.2024233341217041
 iter: 2610  time: 0.19691801071166992
 iter: 2611  time: 0.18845605850219727
 iter: 2612  time: 0.1831052303314209
 iter: 2613  time: 0.16898632049

 iter: 2800  time: 0.251392126083374
 iter: 2801  time: 0.2575037479400635
 iter: 2802  time: 0.23288655281066895
 iter: 2803  time: 0.26453733444213867
 iter: 2804  time: 0.25033068656921387
 iter: 2805  time: 0.26927947998046875
 iter: 2806  time: 0.24140334129333496
 iter: 2807  time: 0.24467873573303223
 iter: 2808  time: 0.2509121894836426
 iter: 2809  time: 0.23498320579528809
 iter: 2810  time: 0.2518937587738037
 iter: 2811  time: 0.2672915458679199
 iter: 2812  time: 0.22636866569519043
 iter: 2813  time: 0.235365629196167
 iter: 2814  time: 0.23237895965576172
 iter: 2815  time: 0.23886919021606445
 iter: 2816  time: 0.2413499355316162
 iter: 2817  time: 0.23591113090515137
 iter: 2818  time: 0.24642658233642578
 iter: 2819  time: 0.24022483825683594
 iter: 2820  time: 0.2602956295013428
 iter: 2821  time: 0.2713029384613037
 iter: 2822  time: 0.32114076614379883
 iter: 2823  time: 0.2713623046875
 iter: 2824  time: 0.26895689964294434
 iter: 2825  time: 0.3193173408508301
 i

 iter: 3013  time: 0.22739243507385254
 iter: 3014  time: 0.26206016540527344
 iter: 3015  time: 0.23636913299560547
 iter: 3016  time: 0.2520270347595215
 iter: 3017  time: 0.23242974281311035
 iter: 3018  time: 0.24482274055480957
 iter: 3019  time: 0.23839306831359863
 iter: 3020  time: 0.21938443183898926
 iter: 3021  time: 0.23746871948242188
 iter: 3022  time: 0.26833176612854004
 iter: 3023  time: 0.2224445343017578
 iter: 3024  time: 0.22871160507202148
 iter: 3025  time: 0.22237515449523926
 iter: 3026  time: 0.21941328048706055
 iter: 3027  time: 0.2104356288909912
 iter: 3028  time: 0.2488727569580078
 iter: 3029  time: 0.29681849479675293
 iter: 3030  time: 0.25953006744384766
 iter: 3031  time: 0.2591276168823242
 iter: 3032  time: 0.2388896942138672
 iter: 3033  time: 0.29950594902038574
 iter: 3034  time: 0.2732675075531006
 iter: 3035  time: 0.32018136978149414
 iter: 3036  time: 0.22942686080932617
 iter: 3037  time: 0.2121257781982422
 iter: 3038  time: 0.232184886932

 iter: 3226  time: 0.3106985092163086
 iter: 3227  time: 0.28977394104003906
 iter: 3228  time: 0.2786545753479004
 iter: 3229  time: 0.2922170162200928
 iter: 3230  time: 0.30521535873413086
 iter: 3231  time: 0.29967784881591797
 iter: 3232  time: 0.3007051944732666
 iter: 3233  time: 0.30720996856689453
 iter: 3234  time: 0.29334235191345215
 iter: 3235  time: 0.2832760810852051
 iter: 3236  time: 0.27426862716674805
 iter: 3237  time: 0.28971171379089355
 iter: 3238  time: 0.3131997585296631
 iter: 3239  time: 0.30608510971069336
 iter: 3240  time: 0.3201460838317871
 iter: 3241  time: 0.31914401054382324
 iter: 3242  time: 0.3206522464752197
 iter: 3243  time: 0.4532613754272461
 iter: 3244  time: 0.3196876049041748
 iter: 3245  time: 0.28112244606018066
 iter: 3246  time: 0.320265531539917
 iter: 3247  time: 0.29720616340637207
 iter: 3248  time: 0.29677844047546387
 iter: 3249  time: 0.3029356002807617
 iter: 3250  time: 0.2942194938659668
 iter: 3251  time: 0.29328036308288574


 iter: 3439  time: 0.23932528495788574
 iter: 3440  time: 0.2513294219970703
 iter: 3441  time: 0.24138879776000977
 iter: 3442  time: 0.2503364086151123
 iter: 3443  time: 0.2765669822692871
 iter: 3444  time: 0.3725473880767822
 iter: 3445  time: 0.2562844753265381
 iter: 3446  time: 0.23879122734069824
 iter: 3447  time: 0.25528454780578613
 iter: 3448  time: 0.24065923690795898
 iter: 3449  time: 0.2361750602722168
 iter: 3450  time: 0.2762298583984375
 iter: 3451  time: 0.2403552532196045
 iter: 3452  time: 0.25638270378112793
 iter: 3453  time: 0.2483837604522705
 iter: 3454  time: 0.23603129386901855
 iter: 3455  time: 0.2433161735534668
 iter: 3456  time: 0.2373964786529541
 iter: 3457  time: 0.2666652202606201
 iter: 3458  time: 0.22542834281921387
 iter: 3459  time: 0.22539234161376953
 iter: 3460  time: 0.2413642406463623
 iter: 3461  time: 0.22679352760314941
 iter: 3462  time: 0.24135637283325195
 iter: 3463  time: 0.23340415954589844
 iter: 3464  time: 0.2676990032196045


AutoReconnect: localhost:27017: [WinError 10054] An existing connection was forcibly closed by the remote host

In [None]:
# the following is a check and a fix because I think the previous process did not run completely, probably a bug.
# are there any other documents that were not processed?
len(list(db.tweets.find({ "hex" : { "$exists" : False } })))

In [None]:
#Process any missing document
cursor=db.tweets.find({ "hex" : { "$exists" : False } })
df = pd.DataFrame(list(cursor))
requests = databasepopulation.add_hexs_and_prepare_bulk_request(df, dataformat='raw')
try:
    db.tweets.bulk_write(requests, ordered=False)
except BulkWriteError as bwe:
    print(bwe.details)

# 3. Database performance task: Add indexes to tweets

In [None]:
databasepopulation.create_indexes(db)

# 4. Populate Users Collection

In [None]:
databasepopulation.populate_users_collection(db)

# 5. Find Home Job for each user id in the database

In [None]:
print('Users with home location identified', db.users.count_documents({'hex9': { '$exists': True} }))


# 6 Generate a Hex-level collection including all hexs in the database

By default the collection include as ids only the ids of hexagons at resolution 9
Note that many types of hexogons could coexist in a same collection because hex identifiers are unique. This is not implemented in the current version of the code, though.

In [None]:
databasepopulation.populate_hexcounts_collection(db)

# 7 Count tweets in each hex by residents and non-residents

In [None]:
import analysis as a

In [None]:
a.countandpopulatejob(db)

# 8 Query to the DB to create the counts dataframe 

In [None]:
import time 
start=time.time()
df=a.hexcountsresults_to_df(db, save=False)
print(time.time()-start)
# if save=True then saves resulting dataframe a pickle in ./hexcountsdf.pkl

In [None]:
df.info()

In [None]:
df.head()

In [None]:
#df.to_csv('C:/Users/Emman/Desktop/Bogota_Hexes_RP.csv')

# Funcions for Spatial Analysis
### A. Transform the dataframe with hexids into a geodataframe with hexagons as geometries

In [None]:
hexgdf = myh3.df_with_hexid_to_gdf(df, hexcolname='_id')
hexgdf.plot()

In [None]:
hexgdf.head()

In [None]:
hexgdf = myh3.df_with_hexid_to_gdf(df, hexcolname='_id')
hexgdf.plot()

## B. Transform the dataframe with hexids into a geodataframe with centoids points as geometries

#### Points can be used for spatial joins.

In [None]:
centroidsgdf = myh3.df_with_hexid_to_centroids_gdf(hexgdf, hexcolname='_id')
centroidsgdf.rotate(270, origin = (0,0), use_radians=False).plot()

In [None]:
smooth_hexgdf = myh3.kring_smoother(hexgdf, hexcolname='_id',  metric_col='nonresidents')

In [None]:
smooth_hexgdf.info()

In [None]:
smooth_hexgdf.plot(column='nonresidents')