### I was able to take the CURL commands on the smat API website and rewrite them into python. Here is what that looks like. Here, I can call in data from compatible SMAT websites into python and analyze them in different ways. For exmample, the JSON output below includes all counts of polls on .win websites for the past month (shown in the doc_count). This is timeseries data, as denoted by the url 

In [10]:
import requests
import pandas as pd

url = "https://api.smat-app.com/timeseries"

params = {
    "term": "polls",
    "interval": "day",
    "site": "win",
    "since": "2022-11-12",
    "until": "2022-12-12",
    "changepoint": "false",
    "esquery": "false"
}

response = requests.get(url, params=params)
df = response.json()
print(df)


{'created_key': 'timestamp', 'took': 770, 'timed_out': False, '_shards': {'total': 16, 'successful': 16, 'skipped': 0, 'failed': 0}, 'hits': {'total': {'value': 1013, 'relation': 'eq'}, 'max_score': None, 'hits': []}, 'aggregations': {'timestamp': {'buckets': [{'key_as_string': '11/12/22(Sat)00:00:00', 'key': 1668211200000, 'doc_count': 74}, {'key_as_string': '11/13/22(Sun)00:00:00', 'key': 1668297600000, 'doc_count': 78}, {'key_as_string': '11/14/22(Mon)00:00:00', 'key': 1668384000000, 'doc_count': 92}, {'key_as_string': '11/15/22(Tue)00:00:00', 'key': 1668470400000, 'doc_count': 89}, {'key_as_string': '11/16/22(Wed)00:00:00', 'key': 1668556800000, 'doc_count': 66}, {'key_as_string': '11/17/22(Thu)00:00:00', 'key': 1668643200000, 'doc_count': 29}, {'key_as_string': '11/18/22(Fri)00:00:00', 'key': 1668729600000, 'doc_count': 26}, {'key_as_string': '11/19/22(Sat)00:00:00', 'key': 1668816000000, 'doc_count': 36}, {'key_as_string': '11/20/22(Sun)00:00:00', 'key': 1668902400000, 'doc_count

### Here, we have another use case of the smat API: text. This will probably the most useful for my analysis. Here, I actually figured out how to transform this into a dataframe. From what I can tell, it doesn't seem to have a limit. I have tested up to 100,000 rows. ### 

In [35]:
url = "https://api.smat-app.com/content"

params = {
    "term": "trump",
    "limit": 5000,
    "site": "win",
    "since": "2022-12-07",
    "until": "2022-12-08",
    "esquery": "false",
    "sortdesc": "false"
}

headers = {
    "accept": "application/json"
}

responses = requests.get(url, params=params, headers=headers)
json_output = responses.json()

#parse through JSON files and make 'content', 'author, and 'timestamp' a column
df2 = pd.DataFrame(columns=['community','content','author','timestamp'])
for hit in json_output['hits']['hits']:
    community = hit['_source']['community']
    content = hit['_source']['content']
    author = hit['_source']['author']
    timestamp = hit['_source']['timestamp']
    df2.loc[len(df2)] = [community, content, author,timestamp]

#convert timestamp column to readable time format
df2['timestamp'] = pd.to_datetime(df2['timestamp'], unit='s')

#remove <p>'s from text
df2['content'] = df2['content'].str.replace('<p>', '')
df2['content'] = df2['content'].str.replace('</p>', '')
display(df2)





Unnamed: 0,community,content,author,timestamp
0,thedonald,It's the company not Trump. Trump is not being...,Agent_86,2022-12-07 00:46:39
1,thedonald,Trump\n,BasedBabe1776,2022-12-07 02:33:53
2,thedonald,trump\n,gabman,2022-12-07 11:24:48
3,thedonald,Walker was endorsed because of his history wit...,GhostOfJebsCampaign,2022-12-07 04:04:49
4,thedonald,thanks Trump\n,cyberwar,2022-12-07 10:45:58
...,...,...,...,...
1225,greatawakening,Because this isn't the first time these exact ...,cathole953,2022-12-07 15:10:42
1226,greatawakening,If you see an article/video here on Unleashed ...,ashlanddog,2022-12-07 05:01:42
1227,thedonald,No End to the Corruption\nJeffery Tucker\nJEFF...,BKav,2022-12-07 03:06:05
1228,thedonald,How Long Will Elon Survive?\nJeffery Tucker\nJ...,BKav,2022-12-07 23:04:04


### Here, I use the timeseries data and transform it into a dataframe. This parsing was a bit easier


In [23]:
url = "https://api.smat-app.com/timeseries"

params = {
    "term": "trump",
    "interval": "day",
    "site": "win",
    "since": "2022-10-12",
    "until": "2022-12-12",
    "changepoint": "false",
    "esquery": "false"
}

response = requests.get(url, params=params)
json_data = response.json()


buckets = json_data['aggregations']['timestamp']['buckets']

#dataframe
df3 = pd.DataFrame(buckets)

#make the df more legible
df3 = df3.drop('key', axis=1)
df3['key_as_string'] = df3['key_as_string'].str.replace('00:00:00', '')
df3 = df3.rename(columns={'key_as_string': 'date', 'doc_count': 'mentions'})

display(df3)

Unnamed: 0,date,mentions
0,10/12/22(Wed),715
1,10/13/22(Thu),779
2,10/14/22(Fri),714
3,10/15/22(Sat),601
4,10/16/22(Sun),606
...,...,...
56,12/7/22(Wed),1230
57,12/8/22(Thu),623
58,12/9/22(Fri),479
59,12/10/22(Sat),622


### Unfortunately, I found out that 'site' actually has many different params (rumble_video, rumble_comment, bitchute_video, bitchute_comment, rutube_video, rutube_comment, tiktok_video, tiktok_comment, lbry_video, lbry_comment, 8kun, 4chan, gab, parler, win, poal, telegram, kiwifarms, gettr, wimkin, mewe, minds, vk, truth_social) and that their JSON files need to be parsed differently. I will eventually go through all of this but, as you can see, the buckets variable is slightly different, using 'now' in the dict rather than 'timestamp' 

In [36]:
url = "https://api.smat-app.com/timeseries"

params = {
    "term": "trump",
    "interval": "day",
    "site": "4chan",
    "since": "2022-10-10",
    "until": "2022-12-12",
    "changepoint": "false",
    "esquery": "false"
}

response = requests.get(url, params=params)
json_datas = response.json()

buckets = json_datas['aggregations']['now']['buckets']
df4 = pd.DataFrame(buckets)

#drop columns and make date more readable
df4 = df4.drop('key', axis=1)
df4['key_as_string'] = df4['key_as_string'].str.replace('00:00:00', '')
df4 = df4.rename(columns={'key_as_string': 'date', 'doc_count': 'mentions'})
display(df4)

             date  mentions
0   10/10/22(Mon)       927
1   10/11/22(Tue)      1079
2   10/12/22(Wed)      1009
3   10/13/22(Thu)      1307
4   10/14/22(Fri)       239
..            ...       ...
58   12/7/22(Wed)       616
59   12/8/22(Thu)       553
60   12/9/22(Fri)       506
61  12/10/22(Sat)       509
62  12/11/22(Sun)       521

[63 rows x 2 columns]


### WIP for truth_social parsing ###

In [44]:
url = "https://api.smat-app.com/content"

params = {
    "term": "nice",
    "limit": 1,
    "site": "truth_social",
    "since": "2022-12-08",
    "until": "2022-12-09",
    "esquery": "false",
    "sortdesc": "false"
}

headers = {
    "accept": "application/json"
}

responses = requests.get(url, params=params, headers=headers)
firey = responses.json()
print(firey)



{'created_key': 'created_at', 'content_key': 'content_cleaned', 'took': 1666, 'timed_out': False, '_shards': {'total': 7, 'successful': 7, 'skipped': 0, 'failed': 0}, 'hits': {'total': {'value': 96, 'relation': 'eq'}, 'max_score': 9.862146, 'hits': [{'_index': 'smat-truthsocial-data-000007', '_id': '109480411827955943', '_score': 9.862146, '_source': {'account': {'acct': 'AZChris11', 'display_name': 'AZChris', 'id': '107838368043245844', 'username': 'AZChris11'}, 'bookmarked': False, 'card': None, 'collected_by': 'smat-scrapy-crawlers', 'content': '<p>Nice!!!</p>', 'content_cleaned': 'Nice!!!', 'created_at': '2022-12-08T22:28:30.554+00:00', 'datatype': 'comment', 'emojis': [], 'favourited': False, 'favourites_count': 0, 'id': '109480411827955943', 'in_reply_to_account_id': '107804517036100106', 'in_reply_to_id': '109480126795920446', 'language': 'en', 'last_seen_ts': '2022-12-11T07:19:34.698368+00:00', 'media_attachments': [], 'mentions': [{'acct': 'dbongino', 'id': '107804517036100106

### End goal is to combine ALL websites (or select a handful of websites) into a single dataframe. Doable, just a little more work needed from my end. 