# Hourly Hashtag Tweet Collection
* Iterate seed's most recent tweets ( restrict by `config.collection.search_languages`)
* Gather all hashtags (should these be filtered?)
* Merge these hashtags with the `config.seed.hashtags`
* Search Twitter for those hashtags and insert those tweets

Maybe create an index on `created_at` if this query starts getting too slow

In [None]:
# required imports to access api_db, misc, misc.CONFIG, ...
import sys
sys.path = ['.', '..', '../..'] + sys.path
from collection import *

### Conditional Execution
Each file needs to verify if it should be executed or not based on the configurations (for some files this is not optional but all should have this section, even if it is tautological). Example:
```python
if not misc.CONFIG["collection"]["execute_this_script"]: exit()
```

In [None]:
# Conditional execution
pass

<hr>
<h1 align="center">driver code</h1>

Get the seed ids

In [None]:
from pytictoc import TicToc

In [None]:
with TicToc():
    print("Loading seed_ids from database...", end="", flush=True)
    seed_ids = [s["_id"] for s in api_db.col_users.find({"depth": 0}, {}).limit(len(misc.CONFIG["seed"]["usernames"]))]
    print("got %d seed users, done." % len(seed_ids))

Query the database for the tweets since yesterday at midnight

In [None]:
yesterday = datetime.datetime.now() - datetime.timedelta(days=1)
print("Yesterday at this time: %s" % yesterday)

In [None]:
yesterday_seed_tweets = list(api_db.col_tweets.find({
    "user": {"$in": seed_ids},
    "created_at": {"$gte": yesterday}
}).limit(10_000)) # in practice this limit is unlikely to be reached unless for a very large seed
print("Found %d/10000 seed tweets" % len(yesterday_seed_tweets))

Extract the hastags from those tweets

In [None]:
hashtags_l = [h.lower() for t in yesterday_seed_tweets for h in dict_key_or_default(t, "hashtags", []) ]

In [None]:
from collections import Counter

In [None]:
print("hashtag counter: %s" % Counter(hashtags_l))

In [None]:
hashtags = set(hashtags_l)
print("Found a total of %d unique hashtags" % len(hashtags))

### perform the search

#### Search goals
* between yesterday(`since`) and today(`until`) in `YYYY-MM-DD` format
* perform the search once for each hastag
* perform the search once for each language
* result_type `mixed`: recent and popular

In [None]:
s_yesterday = yesterday.strftime("%Y-%m-%d")
s_today = datetime.date.today().strftime("%Y-%m-%d")
print("yesterday: %s, today: %s" % (s_yesterday, s_today))

In [None]:
langs = list(set(misc.CONFIG["collection"]["search_languages"]) - {"und"})
for lang in langs:
    print("Searching for tweets in language=%s" % lang)
    for h in hashtags:
        print("  with hashtag [#%s]..." % h, end="", flush=True)
        tweets = search_hashtag(h, since=s_yesterday, until=s_today, lang=lang)
        insert_tweets(tweets)
        print("got %d tweets, done." % len(tweets))

In [None]:
print("DONE")