# What is an API?

## Client and server

An API (Application Programming Interface) is an agreement between two parties about how they will talk to each other. These parties are called client and server.

**Server** is a side that has interesting information or something interesting and allows others on the Internet to take advantage of it. A server is a program that runs indefinitely on a computer and is ready to respond to requests from everyone else on the Internet.

**Client** is a program that sends requests to a server and tries to put together something useful in response. The client is therefore a mobile application with clouds and sunshine icons or our web browser, in which we can open an exchange rate list. But it is also a robot that retrieves information about goods in e-shops on behalf of a price comparison aggegator website.

_We will not focus on the server side in these materials._

# Basic concepts

Before we start creating a client, let&#39;s go through some basic concepts around API.

## Protocol

The whole communication between the client and the server takes place via a so-called protocol. This is nothing but a contracted way of what someone will send to whom and what structure it will have. There are a lot of protocols in the computer world, but we will only be interested in HTTP, because it is used by web APIs and the web itself. It&#39;s no coincidence that the website address in the browser usually starts with http: // (or https: //).

### HTTP

Communication between the client and the server takes the form of an HTTP request, which the client sends to the server, and an HTTP response, which the server sends back. Each of these reports has its own essentials.

### request

+ ** method ** (HTTP method): For example, the GET method has the property that it only reads and we cannot change anything with it via the API - it is so-called secure. In addition to the GET method, there are POST (create), PUT (update) and DELETE (delete) methods that we do not need, because we will only obtain data from the API.+ ** address with parameters ** (URL with query parameters): At the end of the usual URL address a question mark and followed by parameters. If there are more parameters, they are separated by the &amp; character. The address itself most often determines what data it will be (in our example it is movies) and URL parameters allow you to filter on the server side and get only the data that really interests us (in our case, dramas in the length of 150 minutes)        http://api.example.com/movies/
        http://api.example.com/movies?genre=drama&duration=150 
+ ** headers **: Headers are actually just other parameters. The difference is that we do not send them as part of the address and, unlike URL parameters, they are subject to some standardization and conventions.+ ** body ** (points): The body of the message is the box that we send with the request and in which we can put what we want. So preferably something that the API on the other hand will understand. The body may be empty. In the body we can send plain text, data in some format, but also an image. In order for the API, on the other hand, to know what is in the box and how to unpack it, it is usually necessary to send a Content-Type header with the body.
We need to read from the documentation of the specific API how to compose the request correctly.

### Response

+ ** status code **: The numeric code by which the API indicates how the request was processed. According to the first digit of the code, the codes are divided into different categories:1xx - informative response (request was accepted, but its processing continues)2xx - the request was received and processed correctly3xx - redirect, the client needs to send another request elsewhere to get responses4xx - error on the client side (we composed the wrong question)5xx - server side error (API failed to respond)+ ** headers **: Response information such as processing date, response format ...+ ** body ** (points): The body of the answer - what we are most interested in the most

### Formats

The body can be in any format. It can be text, HTML, an image, a PDF file, or anything else.The Content-Type header value is given different names: content type, media type, MIME type.Most often it consists only of the type and subtype, which is separated by a slash. Some examples:+ text / plain - plain text+ text/html - HTML
+ text/csv - CSV
+ image / gif - GIF image+ image / jpeg - JPEG image+ image / png - PNG image+ application/json - JSON
+ application / xml or text / xml - XML

### JSON format

JSON was founded around the year 2000 and soon became a shorter replacement for XML, especially on the web and in web APIs. Today, it is ** probably the most popular format for general structured data ever **. Its author is Douglas Crockford, one of the people involved in the development of the JavaScript language.

Its popularity probably stems from its simplicity. After all, this jupyter laptop is saved in JSON format. Its full specification is described using several diagrams on the [json.org] page (https://www.json.org/json-cz.html).

#### JSON is a data format NOT a data type!

The input is any data structure:+ number+ string+ truth value+ no+ object+ None

The output is always a string

![title](static/null.jpg)

Python (and many others) have support for working with JSON in a basic installation (built-in).
In the case of Python, JSON can be confused with a dictionary. However, be aware that JSON is text that can be stored in a file or sent over HTTP, but cannot be used directly in programming. We must always first process it into dictionaries and lists.

In [1]:
import json

In the following JSON, under the &quot;people&quot; key, there is a list of dictionaries with another structure:

In [2]:
people_info = '''
{
    "people": [
        {
            "name": "John Smith",
            "phone": "555-246-999",
            "email": ["johns@gmail.com", "jsmith@gmail.com"],
            "is_employee": false
        },
        {
            "name": "Jane Doe",
            "phone": "665-296-659",
            "email": ["janed@gmail.com", "djane@gmail.com"],
            "is_employee": null
        }
    ]
}
'''

json.loads converts a string to an object

In [3]:
data = json.loads(people_info)

In [4]:
data

{'people': [{'name': 'John Smith',
   'phone': '555-246-999',
   'email': ['johns@gmail.com', 'jsmith@gmail.com'],
   'is_employee': False},
  {'name': 'Jane Doe',
   'phone': '665-296-659',
   'email': ['janed@gmail.com', 'djane@gmail.com'],
   'is_employee': None}]}

In [5]:
type(data)

dict

In [6]:
type(data['people'])

list

In [7]:
type(data['people'][0])

dict

In [8]:
data['people']

[{'name': 'John Smith',
  'phone': '555-246-999',
  'email': ['johns@gmail.com', 'jsmith@gmail.com'],
  'is_employee': False},
 {'name': 'Jane Doe',
  'phone': '665-296-659',
  'email': ['janed@gmail.com', 'djane@gmail.com'],
  'is_employee': None}]

In [9]:
data['people'][0]

{'name': 'John Smith',
 'phone': '555-246-999',
 'email': ['johns@gmail.com', 'jsmith@gmail.com'],
 'is_employee': False}

In [10]:
data['people'][0]['name']

'John Smith'

#### TaskDownload json from http://pyvec.org/cs/api.json and write a code that lists the names of the board members of the Czech python organization.
* Tip: json can also be downloaded directly from the laptop using the `requests` library. *

In [None]:
%pip install requests

In [11]:
import requests

In [None]:
response = requests.get('http://pyvec.org/cs/api.json')
data = json.loads(response.text)
data

# Working with API clients

## General client

A mobile weather application is a client that someone has created for one specific task and can only work with one specific API. Such a client is useful if we just want to know what the weather is like, but less so if we want to try working with multiple APIs at the same time. Therefore, there are municipal clients.

### Browser as a general client

If we only want to read from the API and the API does not require any login, we can try it in the browser as if it were a website. If we visit the [exchange rate list] on the CNB website (https://www.cnb.cz/cs/financni-trhy/devizovy-trh/kurzy-devizoveho-trhu/kurzy-devizoveho-trhu/) and click on [Text format] (https://www.cnb.cz/en/financni-trhy/devizovy-trh/kurzy-devizoveho-trhu/kurzy-devizoveho-trhu/denni_kurz.txt?date=19.02.2020), we will see the answer from API server

https://www.cnb.cz/cs/financni_trhy/devizovy_trh/kurzy_devizoveho_trhu/denni_kurz.txt

### Generic client on the command line: curl

If we need to log in to the API or try to do more complex things with it than just reading, the browser will not be enough for us.
Therefore, it is good to learn to use the curl program. It runs on the command line and is a Swiss Army knife for anyone moving around web APIs.

#### Examples with curl

![title](static/curl.jpg)

When we enter and run the command, we tell the curl program to send a request to the specified address and list what the CNB will send back.

![title](static/curl-return.jpg)

## Own client

The general client must be controlled by a person (manual setting of parameters, regular start-up based on conditions or time, etc.). This is exactly what we need when we want to try some APIs, but the whole point of the API is for programs to be able to use them automatically.If we want to program the client for a specific task, we can use either the built-in or the installed library in most languages. In the case of Python, we will use the Requests library.

Each decent API has documentation that describes the entire operation of the API. Thus, all possible urls (endpoints), methods, parameters, formats, error codes, etc. The documentation can take the form of a website such as [Prague data] (https://golemioapi.docs.apiary.io/) or data from [British police] (https://data.police.uk/docs/), which we will use shortly.A very commonly used way to describe APIs is also [OpenAPI] (https://www.openapis.org/) (formerly Swagger). The API is described using this standard in a text format, which can then be visualized as an example of this fictional [Pet Store] (https://petstore.swagger.io/). Such a standardized description is also machine-processable.

## Golemio - Prague public data

Golemio is a Prague data platform. The documentation can be found at https://golemioapi.docs.apiary.io/# We will use data on the passage of cyclists through measuring devices.Their location and current number of passes can be seen on the interactive map https://unicam.camea.cz/Discoverer/BikeCounter/map.

In [None]:
%pip install requests

In [13]:
from datetime import datetime, timedeltaimport json
import requests

In each query, we must authorize using the API key.We will obtain it after free registration at https://api.golemio.cz/api-keys/auth/sign-up.
For example, the key is used to limit the number of queries. You can currently send 10,000 queries in 10 seconds.
The API key is inserted in the query header named `x-access-token`. So let&#39;s prepare a header. It will be used for all API queries
* Source: https://golemioapi.docs.apiary.io/#introduction/general-info/usage*

In [14]:
GOLEMIO_API_KEY = &#39;enter your key here&#39;headers = {
  'Content-Type': 'application/json; charset=utf-8',
&#39;x-access-token&#39;: GOLEMIO_API_KEY,}

Endpoint documentation on cyclist passes is herehttps://golemioapi.docs.apiary.io/#reference/traffic/bicyclecounters/get-all-bicyclecounters.

In addition to data specification, the API can also be tested directly on the web. Just copy the API key.

In [16]:
response = requests.get('https://api.golemio.cz/v2/bicyclecounters/', headers=headers)
response

<Response [200]>

In [17]:
response.raise_for_status()

In [18]:
type(response)

requests.models.Response

In [None]:
dir (response)

In [None]:
response.text

In [None]:
response.json()

In [None]:
response.status_code

In [None]:
data_json = json.loads(response.content)
data_json

Or more simply directly using the prepared `json` method.

In [None]:
data_json = response.json()
data_json

In [22]:
type(data_json['features'])

list

In [23]:
data_json['features'][0]

{'geometry': {'coordinates': [14.3986383, 50.0718897], 'type': 'Point'},
 'properties': {'directions': [{'id': 'camea-BC_AL-PL',
    'name': 'Plzeňská (z centra)'},
   {'id': 'camea-BC_AL-ST', 'name': 'Štefánikova (centrum)'}],
  'id': 'camea-BC_AL-STPL',
  'name': 'Anděl (Plzeňská)',
  'route': 'A14',
  'updated_at': '2021-06-09T19:00:00.569Z'},
 'type': 'Feature'}

In [24]:
print(data_json['features'][0]['properties']['id'])
print(data_json['features'][0]['properties']['name'])
print(data_json['features'][0]['properties']['directions'][0]['id'])
print(data_json['features'][0]['properties']['directions'][1]['id'])

camea-BC_AL-STPL
Anděl (Plzeňská)
camea-BC_AL-PL
camea-BC_AL-ST


In [None]:
def get_bicycle_counters() -> dict:
    """ Return all bicycle counters """
    response = requests.get('https://api.golemio.cz/v2/bicyclecounters/', headers=headers)
    
    # this raises exception if response status code is error (starts with 4 or 5)
    response.raise_for_status()
    
    counters = {}
    for counter in response.json()['features']:
        counter_id = counter['properties']['id']
        counter_name = counter['properties']['name']
        direction_ids = [direction['id'] for direction in counter['properties']['directions'] if direction['id']]
        
        # skip empty counters
        if len(direction_ids) == 0:
            continue
        
        counters[counter_id] = {
            'name': counter_name,
            'direction_ids': direction_ids,
        }
    
    return counters

bicycle_counters = get_bicycle_counters()
bicycle_counters    

In [26]:
response = requests.get('https://api.golemio.cz/v2/bicyclecounters/detections?id=ecoCounter-103047647&aggregate=true', headers=headers)
response.json()

[{'id': 'ecoCounter-103047647',
  'value': 333809,
  'value_pedestrians': None,
  'locations_id': 'ecoCounter-100047647',
  'measurement_count': '47490',
  'measured_from': '1970-01-01T00:00:00.000Z',
  'measured_to': '2021-06-09T19:03:31.613Z'}]

In [29]:
def get_bike_count(counter_direction_id: str, time_from: datetime, duration: timedelta = None) -> int:
    """ Return number of bike detections of counter in one direction in specific time frame """
    if duration is None:
duration = timedelta (days = 1)        
    params = {
        'id': counter_direction_id,
        'from': time_from.isoformat(),
        'to': (time_from + duration).isoformat(),
        'aggregate': 'true',
    }
    
    response = requests.get('https://api.golemio.cz/v2/bicyclecounters/detections', params=params, headers=headers)
    response.raise_for_status()
    
    # no measurments
    if len(response.json()) == 0:
        return 0
    
    return response.json()[0]['value']


# example usage
get_bike_count('camea-BC_AL-ST', datetime(2020, 12, 1), timedelta(days=1))

97

In [31]:
def get_all_directions_counts(station_id: str, *args, counters: dict=None, **kwargs) -> tuple:
    """ Return number of bike detections in all directions in a dict (direction_id: count).
        Parameters are similar to get_bike_count function (see the usage on last line).
    """
    if counters is None:
        counters = get_bicycle_counters()
    
    counts = {}
    for direction_id in counters[station_id]['direction_ids']:
        counts[direction_id] = get_bike_count(direction_id, *args, **kwargs)
        
        
    return counts

get_all_directions_counts('camea-BC_VK-MOKO', datetime(2021, 6, 2, 11), timedelta(hours=1))

{'camea-BC_VK-KO': 67, 'camea-BC_VK-MO': 50}

### Tasks
* How many cyclists rode yesterday in the time 6.00 - 11.00 in Modřany?* Which place was the busiest yesterday? And which of the others?
* Other tasks are unresolved. You can try them after an hour. Downloading data can take a long time. ** How was driving in 2020 compared to the previous year?* Where is the biggest difference between one way and the other? (perhaps in the last month)* Does higher temperature mean more cyclists? Temperature detection https://golemioapi.docs.apiary.io/#reference/traffic/bicyclecounters/get-bicyclecounters-temperatures* try to visualize* how big the correlation is

In [33]:
# Solution of the number of cyclists in Modřany in individual directions
for id, counter in get_bicycle_counters().items():
    if counter['name'] != 'Modřany':
        continue
    
    for direction_id in counter['direction_ids']:
        count = get_bike_count(direction_id, datetime(2021, 6, 2, 11), timedelta(hours=5))
        print(id, direction_id, count)
    

camea-BC_VK-MOKO camea-BC_VK-KO 687
camea-BC_VK-MOKO camea-BC_VK-MO 367


In [None]:
# Get the number of cyclists for all stations on a given day# download takes a while, interim results are listed
day_counts = []
for station_id in bicycle_counters:
    print(station_id, end='')
    counts = get_all_directions_counts(
        station_id, datetime(2021, 6, 1), duration=timedelta(days=1), counters=bicycle_counters
    )
    
    print(station_id, bicycle_counters[station_id]['name'], counts, sum(counts.values()))

    day_counts.append((station_id, bicycle_counters[station_id]['name'], sum(counts.values())))
    
day_counts

In [36]:
# the first two most frequented places - solutions in pythonsorted_counts = sorted(
day_counts, # we want to sort this listkey = lambda row: row [2], # use the third item from the tuple to sortreverse = True # from highest (default is from lowest))             

sorted_counts [: 2] # first two records

[('camea-BC_PT-ZOVO', 'Povltavská', 3357),
 ('camea-BC_PN-VYBR2', 'Podolské nábřeží - vozovka', 3331)]

In [37]:
import pandas as pd

In [38]:
# The first two busiest places - the solution in Pandasday_counts_df = pd.DataFrame(day_counts, columns=['station_id', 'name', 'day_count'])
day_counts_df = day_counts_df.set_index('station_id')
day_counts_df.sort_values('day_count', ascending=False)[:2]

Unnamed: 0_level_0,name,day_count
station_id,Unnamed: 1_level_1,Unnamed: 2_level_1
camea-BC_PT-ZOVO,Povltavská,3357
camea-BC_PN-VYBR2,Podolské nábřeží - vozovka,3331


## Crime in the UK
* Example of another public API *
We will try API queries with crime data in the UK, which are available on a monthly basis by approximate location (see https://data.police.uk/docs/method/stops-at-location/)

In [39]:
api_url = &quot;https://data.police.uk/api/stops-street&quot;

Setting API call parameters according to documentation https://data.police.uk/docs/method/stops-at-location/I chose the infamous Hackney district in London as my location :)

In [40]:
params = {
&quot;years&quot;: &quot;51.5487158&quot;,&quot;lng&quot;: &quot;-0.0613842&quot;,    "date" : "2018-06"
}

Use the `get` function to send a request for an API URL. The URL with the parameters looks like this: https://data.police.uk/api/stops-street?lat=51.5487158&amp;lng=-0.0613842&amp;date=2018-06 and it is possible to try it in a browser.
In the response variable we have a stored object that contains the response from the API.

In [41]:
response = requests.get(api_url, params=params)

If the status code is other than 200 (success), the script throws an error and an error status code

In [42]:
if response.status_code != 200:
    print('Failed to get data:', response.status_code)
else:
    print('First 100 characters of data are')
    print(response.text[:100])

First 100 characters of data are
[{"age_range":"18-24","outcome":"Community resolution","involved_person":true,"self_defined_ethnicit


Header with additional information about the answer

In [43]:
response.headers

{'Date': 'Wed, 09 Jun 2021 19:22:00 GMT', 'Content-Type': 'application/json', 'Content-Length': '5687', 'Connection': 'keep-alive', 'Vary': 'Accept-Encoding', 'Access-Control-Allow-Origin': '*', 'Content-Encoding': 'gzip', 'Strict-Transport-Security': 'max-age=31536000;', 'X-XSS-Protection': '1; mode=block', 'X-Content-Type-Options': 'nosniff', 'X-Frame-Options': 'DENY', 'Content-Security-Policy': "default-src 'self' 'unsafe-inline' ; script-src 'self' data: www.google-analytics.com ajax.googleapis.com 'unsafe-inline';", 'Referer-Policy': 'strict-origin-when-cross-origin'}

In [44]:
response.headers['content-type']

'application/json'

The content of the answer is a byte string

In [45]:
response.content[:200]

b'[{"age_range":"18-24","outcome":"Community resolution","involved_person":true,"self_defined_ethnicity":"Black\\/African\\/Caribbean\\/Black British - Any other Black\\/African\\/Caribbean background","gend'

It looks like a list or dictionary, but doesn&#39;t behave like this:

In [46]:
response[0]["age_range"]

TypeError: 'Response' object is not subscriptable

We will convert the byte string using the .json () method from the requests library

In [47]:
data = response.json()

We will verify the data type

In [48]:
type(data)

list

Now we can access the &quot;data&quot; as a classic list (list)

In [49]:
data[0]["age_range"]

'18-24'

Converts a list (sheet) to a string with parameters to display the structure in a readable form

In [50]:
datas = json.dumps(data, sort_keys=True, indent=4)

In [51]:
print(dates[:1600])

[
    {
        "age_range": "18-24",
        "datetime": "2018-06-01T09:45:00+00:00",
        "gender": "Male",
        "involved_person": true,
        "legislation": "Misuse of Drugs Act 1971 (section 23)",
        "location": {
            "latitude": "51.551330",
            "longitude": "-0.068037",
            "street": {
                "id": 968551,
                "name": "On or near Downs Park Road"
            }
        },
        "object_of_search": "Controlled drugs",
        "officer_defined_ethnicity": "Black",
        "operation": false,
        "operation_name": null,
        "outcome": "Community resolution",
        "outcome_linked_to_object_of_search": null,
        "outcome_object": {
            "id": "bu-community-resolution",
            "name": "Community resolution"
        },
        "removal_of_more_than_outer_clothing": null,
        "self_defined_ethnicity": "Black/African/Caribbean/Black British - Any other Black/African/Caribbean background",
        "t

The cycle by which we approach the age range of people lustrated by the police

In [52]:
age_range = [i["age_range"] for i in data]

In [53]:
print(age_range)

['18-24', '18-24', 'over 34', '18-24', '10-17', '10-17', 'over 34', '25-34', 'over 34', '25-34', None, '25-34', '18-24', '10-17', None, '18-24', None, '18-24', '10-17', 'over 34', '18-24', '18-24', '18-24', '18-24', '18-24', '18-24', '18-24', '18-24', '18-24', '25-34', '18-24', '18-24', '18-24', 'over 34', '10-17', '10-17', '25-34', '18-24', '18-24', '25-34', '25-34', '25-34', 'over 34', 'over 34', '18-24', '18-24', '18-24', '18-24', '18-24', '25-34', '25-34', 'over 34', '25-34', 'over 34', '18-24', '25-34', '25-34', 'over 34', '18-24', None, '18-24', '18-24', None, '18-24', '18-24', '25-34', '10-17', '25-34', '18-24', '25-34', '18-24', None, '18-24', '25-34', '25-34', '25-34', '18-24', '25-34', '25-34', '18-24', '18-24', '10-17', 'over 34', 'over 34', '18-24', '18-24', '25-34', '10-17', '18-24', 'over 34', '10-17', '25-34', 'over 34', '18-24', '25-34', 'over 34', '25-34', '18-24', '18-24', '18-24', '18-24', '10-17', '10-17', '18-24', '25-34', '18-24', '25-34', '18-24', '18-24', '10-17

The cycle by which we access the street id where the suspect&#39;s lustration took place

In [54]:
street_id = [i["location"]["street"]["id"] for i in data]

In [55]:
print(street_id)

[968551, 968830, 968830, 968740, 964026, 964026, 968844, 968662, 968662, 968662, 971832, 971832, 968828, 968828, 968805, 968828, 968805, 968805, 968805, 968584, 964086, 968632, 968632, 964132, 968632, 968632, 968584, 968584, 968872, 971832, 968717, 968866, 971656, 964226, 968662, 968662, 968703, 968668, 968668, 968703, 964013, 968505, 968830, 968500, 968662, 968830, 968830, 968662, 968662, 968705, 964150, 968663, 968663, 968830, 968467, 968662, 968663, 968830, 964370, 964370, 968500, 964287, 964329, 971656, 971656, 968830, 968829, 968830, 968829, 968608, 968703, 968703, 968469, 968662, 968754, 968662, 968872, 968748, 968872, 968691, 968641, 968641, 964023, 964322, 968872, 968872, 968872, 968662, 964219, 964092, 964219, 968854, 968662, 968662, 968662, 968786, 968584, 968662, 964266, 964316, 964266, 968637, 968637, 968804, 968804, 968804, 971758, 968804, 968662, 964297, 968830, 968770, 968500, 968662, 968804, 968500, 964324, 964266, 964225, 968816, 968500, 964266, 968641, 968575, 968828,

In [56]:
import pandas as pd

We combine the lists into a dataframe

In [57]:
df_from_lists = pd.DataFrame(list(zip(age_range, street_id)), 
                columns = ['age_range', 'street_id'])

In [58]:
df_from_lists.head()

Unnamed: 0,age_range,street_id
0,18-24,968551
1,18-24,968830
2,over 34,968830
3,18-24,968740
4,10-17,964026


What age group did the police most often lustrate?

In [59]:
%matplotlib inline

ModuleNotFoundError: No module named 'matplotlib'

In [None]:
df_from_lists["age_range"].value_counts().plot.bar();

### Json_normalize
or how to easily convert JSON to DataFrame

In [None]:
data

In [None]:
from pandas import json_normalize

In [None]:
norm_data = json_normalize(data)

In [None]:
norm_data.head()

In [None]:
norm_data["gender"].value_counts()

In [None]:
norm_data["gender"].value_counts().plot.bar();

In [None]:
norm_data["age_range"].value_counts().plot.bar();

### We create our own client

In the next block, we will create a client that downloads the data for two months (instead of one) and stores it in a list of lists. We will handle possible API connection errors with exceptions - for more see [documentation requests] (https://requests.readthedocs.io/en/master/_modules/requests/exceptions/)

In [None]:
def get_uk_crime_data(latitude, longitude, dates_list):
    """
    Function loops through a list of dates 
    
    Three arguments latitude, longitude and a list of dates
    
    Returns a dataframe with crime data for each day
    """
    appended_data = []
    
    for i in dates_list:
api_url = &quot;https://data.police.uk/api/stops-street&quot;        params = {
&quot;lat&quot;: latitude,&quot;lng&quot;: longitude,            "date" : i
        }
        response = requests.get(api_url, params=params)
        data_foo = response.json()
            
        data = pd.json_normalize(data_foo)
        # store DataFrame in list
        appended_data.append(data)
       
    return pd.concat(appended_data)

Calling the get_uk_crime_data function with the latitude and longitude parameters assigned to the df_uk_crime_data variable

In [None]:
dates_list = ["2018-06","2018-07"]
years = &quot;51.5487158&quot;lng = &quot;-0.0613842&quot;
df_uk_crime_data = get_uk_crime_data(lat, lng, dates_list)

In [None]:
df_uk_crime_data.head()

## Accessing Tweets via the Twitter API using the Tweepy library

Command to install the tweepy library inside the laptop. Just uncomment and run.

In [None]:
%pip install tweepy

In [None]:
import tweepy

To obtain data from Twitter, our client must pass OAuth authorization.
** How does OAuth authorization work on Twitter? **
The 1st application developer registers with the API provider2. registers the application, obtains consumer_key, consumer_secret, access_token and access_secret at https://developer.twitter.com/en/appsThe 3rd application calls the API and proves consumer_key, consumer_secret, access_token and access_secret

In [None]:
consumer_key = ""
consumer_secret = ""
access_token = ""
access_secret = ""

The next step is to create an instance of OAuthHandler, in which we insert our consumer token and consumer secret

In [None]:
auth = tweepy.OAuthHandler(consumer_key, consumer_secret)
auth.set_access_token(access_token, access_secret)

Verification of authentication functionality

In [None]:
api = tweepy.API(auth)

try:
    api.verify_credentials()
    print("Authentication OK")
except Exception:
    print("Error during authentication")

In the API documentation for Tweepy http://docs.tweepy.org/en/v3.5.0/api.html we find a method which, for example, lists the IDs of friends, resp. tracking account

In [None]:
api.friends_ids('@kdnuggets')

Or it lists the ID that the account is tracking

In [None]:
api.followers_ids('@kdnuggets')

A method that returns the last 20 tweets by user ID

In [None]:
twitter_user = api.user_timeline('@kdnuggets')

In [None]:
twitter_user

In [None]:
kdnuggets_tweets = [i.text for i in twitter_user]
kdnuggets_tweets

In [None]:
dir(twitter_user[0])

In [None]:
twitter_user[0].retweet_count

In [None]:
def get_tweets(consumer_key, consumer_secret, access_token, access_secret, twitter_account):
    """
    Function gets the last 20 tweets and adds those not in the list
    
    Five arguments consumer_key, consumer_secret, access_token, access_secret, and twitter_account name
    
    Returns a dataframe with tweets for given account
    """
    
    auth = tweepy.OAuthHandler(consumer_key, consumer_secret)
    auth.set_access_token(access_token, access_secret)
    api = tweepy.API(auth)

    try:
        api.verify_credentials()
        print("Authentication OK")
        twitter_user = api.user_timeline(twitter_account)
        
        tweets_list = [i.text for i in twitter_user]
                      
    except Exception:
        print("Error dubing authentication")
    
    return pd.DataFrame(tweets_list, columns = [twitter_account])

In [None]:
%pip install pandas

In [None]:
import pandas as pd

In [None]:
get_tweets(consumer_key, consumer_secret, access_token, access_secret, '@honzajavorek')

We can also search for tweets by hashtag!

In [None]:
for tweet in api.search('#masks4all'):
    print(tweet.user.screen_name, tweet.text)
    print('---')

But this way we only get the last 20 tweets. If this is not enough for us, then according to the documentation for the [search] method (https://docs.tweepy.org/en/v3.5.0/api.html#API.search) we can set the return per page `rpp = 30`, however, this can be set to a maximum of 100. If we want more, we need to browse the results page by page. So set the parameter `page = 2` and go through the cycle step by step. Pages are numbered from one.