# PYCT: CrowdTangle data retrieval in Python

Full API documentation at [github.com/CrowdTangle/API/wiki](https://github.com/CrowdTangle/API/wiki).

In [None]:
import pyct

----
### `pyct.auth()`

Connect to one of your dashboards through its API key.

In [None]:
pyct.auth()

----
### `pyct.lists()`

See which lists you have in the dashboard.

In [None]:
pyct.lists()

----
### `pyct.getlists()`

This gets data from all "Lists" in the current CT dashboard. 

- Get 100 posts per call.
- Set a pause of, at least, 10 seconds between calls to avoid hitting the rate limit (6 calls per minute).
- Gets data from all Lists within the current dashboard, but not from any Saved Searches.
- Saves data iteratively to a growing csv while paging through results.
- Will get a maximum of 10k posts for the given slice of time. Set start/and or end dates manually to fit for slices that fall below 10k results, in order to get all data.
- If you have lots of data for every day, note that `startdate` and `enddate` can be set down to the second using this format `2018-12-28T07:23:05`
- If you get no output, try a different daterange.

Format:
`pyct.lists(startdate,enddate,pause)`

In [None]:
pyct.getlists('2019-01-01T07:00:01', '2019-01-01T07:00:02',10)

----
### `pyct.getsearch`

This gets data from "Saved Searches" in the current CT dashboard. 

- Gets 100 posts per call.
- Set a pause of, at least, 10 seconds between calls to avoid hitting the rate limit (6 calls per minute).
- Gets data from the Saved Searches that you specify through its id number. The `id` parameter should be formatted like this `'1421573'` when calling for only one search list, and comma separated --`'1421573,1426545'` when you want two or more.
- Saves data iteratively to csv while paging through results.
- Will get a maximum of 10k posts for a set slice of time. Set start/and or end dates manually to fit for slices that fall below 10k results.
- If you have lots of data for every day, note that `startdate` and `enddate` can be set down to the second using this format `2018-12-28T07:23:05`
- If you get no output, try a different daterange.

Format:
`pyct.getsearch(id,startdate,enddate,pause)`

In [None]:
# Inspect id numbers of your Saved Searches and use one of your SAVED_SEARCH ids below
pyct.lists()

In [None]:
pyct.getsearch('1426545,1444036','2020-07-30','2020-10-15',10)

----
### `pyct.getlistsfast()`

This function for getting data from all "Lists" is for users that have been allowed by CT to get more than 100 posts per call. The function is written for a use case where we assume the limit to be at 10,000 posts.

The function gets data from all "Lists" in the current CT dashboard

- Gets 10,000 posts per call.
- Gets data from all Lists within the current dashboard, but not from any Saved Searches.
- Does not use the paging method to advance through results, but instead uses a method where the end date of a previous query becomes the starting point for the following query.
- Set a pause of, at least, 10 seconds between calls to avoid hitting the rate limit (6 calls per minute).
- Saves data iteratively to csv while retrieving new results.
- Will get a maximum of 60k posts per minute. 

Format:
`pyct.getlistsfast(pause)`

In [None]:
pyct.getlistsfast(15) # got 70k with 10 sec

----
### `pyct.search()`
Users that have been provided access by CT to the `GET /posts/search` endpoint can use this method for getting data. Access is granted based on the use case and on a dashboard-by-dashboard basis. You only have access if you are using an API token that comes from a dashboard with the search API activated.

This function will retrieve posts from both pages, groups and profiles, for a set of given parameters and search terms. 

Note that this endpoint, unlike the `GET /posts endpoint`, searches the entire, cross-platform CrowdTangle system of posts. It can be limited by lists and accounts, but by default will search beyond the dashboard that the token is associated with. This means that if you are using the `GET /posts/search` endpoint, it doesn't really matter what lists you have added to your dashboard.

Searches can be customised in different ways, described in the [API documentation](https://github.com/CrowdTangle/API/wiki/Search).

This `pyct` function will save each batch of retrieved data as a separate csv file.

Format:
`pyct.search(count,startdate,enddate,sort_by,pause,searchterms)`

- Set `count` at 100, or at your allowed number.
- Set `startdate` and `enddate` for the search.
- Set the sorting method to one out of: 'date', '[interaction_rate](https://help.crowdtangle.com/en/articles/1141064-how-is-interaction-rate-calculated)', 'total_interactions', '[overperforming](https://help.crowdtangle.com/en/articles/1141056-how-is-overperforming-calculated)', or 'underperforming'.
- Set a `pause` of, at least, 10 seconds between calls to avoid hitting the rate limit (6 calls per minute).
- Set the `searchterms`. Format: 'data, datascience, data science'. The comma-separated terms will be joined by the OR operator. See [API documentation](https://github.com/CrowdTangle/API/wiki/Search) on how to deal with AND-searches, and other customisations. Also the comma-separated terms are seen as phrases, so that "data, data science", will search for posts either matching "data" or "data science", but not "science".


In [None]:
pyct.search(10000,'2020-09-30', '2020-10-02','date',10,'covid,corona,trump')

----
### `pyct.joinsearchcsvs()`

As data gathered through the `pyct.search()` function will output several csvs, the function below offers an automated method for joining them together. 

The function takes all csvs in your '_data' directory that have been produced using `pyct.search()`, meaning that their filenames start with 'pyct-data-search-*'. Make sure that only such files that you want to merge are present in the directory.

Note that the function will not delete the individual csvs, but create an additional merged version of them.

In [None]:
pyct.joinsearchcsvs()