# APIs for data retrieval

An Application Programming Interface, commonly known as API, is a set of protocols, routines, and tools for building software applications. APIs allow different software systems to communicate with each other and exchange data in a standardized and efficient way.

APIs for retrieving data enable developers to access and extract information from various sources such as databases, web services, or applications. These APIs often provide a structured and consistent way of accessing data, making it easier for developers to consume and use the data in their applications.

APIs for retrieving data can be used for a variety of purposes, such as gathering information for business intelligence, analyzing user behavior, or integrating data from different sources into a single application. These APIs often use standardized data formats, such as JSON or XML, to ensure compatibility and interoperability between different systems.

As the amount of data available online continues to grow, APIs for retrieving data have become an essential tool for developers to access and analyze this data. By leveraging these APIs, developers can quickly and easily retrieve the data they need, without having to manually extract and process it themselves.

### API types
There are several types of APIs available, but two of the most commonly used types are REST API and HTTP API.

- REST API (Representational State Transfer API):
REST stands for Representational State Transfer, and it's a set of architectural principles for building web services. A REST API is a type of web service that follows the REST architecture principles. REST APIs use HTTP methods (GET, POST, PUT, DELETE, etc.) to access and manipulate resources, which are identified by URIs (Uniform Resource Identifiers). REST APIs typically return data in JSON or XML format and are widely used for building web and mobile applications.


- HTTP API (Hypertext Transfer Protocol API):
HTTP stands for Hypertext Transfer Protocol, which is the protocol used for transferring data over the World Wide Web. An HTTP API is a type of web service that uses HTTP methods to access and manipulate resources. An HTTP API can be RESTful, but it doesn't have to be. HTTP APIs are often used for simple operations like CRUD (Create, Read, Update, Delete) on resources and return data in JSON or XML format.


Other types of APIs include SOAP (Simple Object Access Protocol), GraphQL, and WebSockets. SOAP is an older protocol used for building web services, while GraphQL is a newer API technology that allows clients to specify the data they need and receive it in a single request. WebSockets are used for real-time, two-way communication between a client and a server.

### python and APIs

Python is a popular programming language that provides powerful tools for interacting with APIs. Here are the steps to interact with APIs using Python:

1. Import the necessary libraries: Python has several libraries that make it easy to interact with APIs, including requests, json, and urllib. Before making any API requests, you need to import the appropriate libraries.

2. Find the API endpoint: The endpoint is the URL that you will use to send your API requests. It's essential to understand the API documentation to find the correct endpoint for the specific data you want to retrieve.

3. Send a request: Once you have the endpoint, you can use Python's requests library to send an HTTP request to the API endpoint. The requests library has several methods for sending different types of HTTP requests, including GET, POST, PUT, DELETE, and more.

4. Parse the response: The API response will typically be in JSON format. Python's json library can be used to parse the JSON data and convert it into a Python dictionary that you can easily work with in your code.

5. Extract the data: Once you have the API response in a Python dictionary, you can extract the data you need and use it in your application.

Python's ease of use and powerful libraries make it an excellent language for interacting with APIs. With just a few lines of code, you can send requests to APIs, parse the response data, and extract the information you need to build powerful applications.

### Using the request package

To do this, we need to know how to send requests first. We will use an amazing package called [`requests`](http://docs.python-requests.org/en/master/). If you do not have it installed, please install it using e.g. `pip` (in your command prompt or terminal):


```$ pip install requests```


In [8]:
pip install requests

Note: you may need to restart the kernel to use updated packages.


In [6]:
import requests # library for making HTTP requests
import pandas as pd # library for data analysis
import datetime as dt # library for handling date and time objects

###### open DMI weather data 

Go to the [documentation](https://confluence.govcloud.dk/display/FDAPI/Danish+Meteorological+Institute+-+Open+Data)

Using weather as an example, we should first know what is the request URL (where the request goes to), with what parameters(e.g., API key and stationID). In our case, we know that our API key and the stationId to query so we can do the following.

You will have to create a user and retrieve an API key for the API you want to use [how to](https://confluence.govcloud.dk/display/FDAPI/User+Creation)

I have saved my API key in an file ```.env```
    
    api_key = your_api_key 

Specifically we will look at [Meteorological Observation](https://confluence.govcloud.dk/display/FDAPI/Meteorological+Observation) 

In [1]:
my_dict = {}

with open(r"/Users/christianhellum/Cand. Merc./Data-Science-Project/data_science_project/.env", "r") as f:
    for line in f:
        key, val = line.split('=')
        my_dict[key.strip()] = val.strip()

In [3]:
headers = my_dict
print(headers)

{'X-RapidAPI-Key': 'c795ba9581msh07d0178516af21bp1d788bjsnd4de235dbca6', 'X-RapidAPI-Host': 'sky-scrapper.p.rapidapi.com'}


In [4]:
querystring = {"hotelId":"106005202","entityId":"27537542","checkin":"2024-04-13","checkout":"2024-04-20","adults":"2","rooms":"1","currency":"USD","market":"en-US","countryCode":"US"}

In [7]:
url = "https://sky-scrapper.p.rapidapi.com/api/v1/hotels/getHotelPrices"
r = requests.get(url, headers=headers, params=querystring) # Issues a HTTP GET request
r.url # `requests` help us encode the URL in the correct format

'https://sky-scrapper.p.rapidapi.com/api/v1/hotels/getHotelPrices?hotelId=106005202&entityId=27537542&checkin=2024-04-13&checkout=2024-04-20&adults=2&rooms=1&currency=USD&market=en-US&countryCode=US'

In [9]:
r.status_code # 200 means success

200

In [10]:
hotel = r.json()
hotel

{'status': True,
 'timestamp': 1712560010960,
 'data': {'metaInfo': {'ratesCta': 'Go to site',
   'rates': [{'partnerName': 'SuperTravel',
     'partnerLogo': 'https://www.skyscanner.com/images/websites/220x80/h_11.png',
     'partnerId': 'h_11',
     'roomType': 'Double room',
     'roomPolicies': 'Meals not included, Non-Refundable',
     'deeplink': 'www.skyscanner.com/hotel_deeplink/4.0/US/en-US/USD/h_11/106005202/2024-04-13/2024-04-20/hotel/hotel/hotels?guests=2&rooms=1&legacy_provider_id=926482&request_id=642c5800-d5bc-464d-8834-ba4ba5f8514e&q_datetime_utc=2024-04-08T07%3A06%3A50&redirect_delay=1000&appName=goiPhone&appVersion=7.92.1&client_id=skyscanner_app&tm_city_code=NYCA&tm_country_code=US&tm_place_name=New+York&tm_stars=4&ticket_price=1235.0&deeplink_data=eyJmaWVsZHMiOiB7InNpZ25hdHVyZSI6ICJiOWU3ZmQwNjRjMTA5YzUwZDQ5OWZlOTMyOWQ4OTk5ZSJ9LCAidXJsIjogImh0dHBzOi8vd3d3LnN1cGVyLmNvbS90cmF2ZWwvdHJhbnNpdGlvbi8%2FYWx3ZF9yaXNrPWZhbHNlJnJ0cD1TZFRVcWRSN3dJZXc4YVJTOWVzQXR3JTI1M0QlMjUzRCZy

JSON object will be converted into a `dict` type, which is the data structure in Python holding key value pairs. To access certain values, we just access them like a `dict`.

In [11]:
hotel_labels = hotel['data']['metaInfo']['rates']
print(hotel_labels)

[{'partnerName': 'SuperTravel', 'partnerLogo': 'https://www.skyscanner.com/images/websites/220x80/h_11.png', 'partnerId': 'h_11', 'roomType': 'Double room', 'roomPolicies': 'Meals not included, Non-Refundable', 'deeplink': 'www.skyscanner.com/hotel_deeplink/4.0/US/en-US/USD/h_11/106005202/2024-04-13/2024-04-20/hotel/hotel/hotels?guests=2&rooms=1&legacy_provider_id=926482&request_id=642c5800-d5bc-464d-8834-ba4ba5f8514e&q_datetime_utc=2024-04-08T07%3A06%3A50&redirect_delay=1000&appName=goiPhone&appVersion=7.92.1&client_id=skyscanner_app&tm_city_code=NYCA&tm_country_code=US&tm_place_name=New+York&tm_stars=4&ticket_price=1235.0&deeplink_data=eyJmaWVsZHMiOiB7InNpZ25hdHVyZSI6ICJiOWU3ZmQwNjRjMTA5YzUwZDQ5OWZlOTMyOWQ4OTk5ZSJ9LCAidXJsIjogImh0dHBzOi8vd3d3LnN1cGVyLmNvbS90cmF2ZWwvdHJhbnNpdGlvbi8%2FYWx3ZF9yaXNrPWZhbHNlJnJ0cD1TZFRVcWRSN3dJZXc4YVJTOWVzQXR3JTI1M0QlMjUzRCZyaXNrX2xrXzE9ZmFsc2UmYWx3ZF9yYW5kPWZhbHNlJnByb3ZpZGVyX2hvdGVsX2lkPTYyOTUmcHJvdmlkZXI9ZWFuJnByaWNlPTEyMzQuNzImdG90YWxfcHJpY2U9MTU1OC45

In [12]:
for labels in hotel_labels:
    for key, value in labels.items():
        print(key, value)

partnerName SuperTravel
partnerLogo https://www.skyscanner.com/images/websites/220x80/h_11.png
partnerId h_11
roomType Double room
roomPolicies Meals not included, Non-Refundable
deeplink www.skyscanner.com/hotel_deeplink/4.0/US/en-US/USD/h_11/106005202/2024-04-13/2024-04-20/hotel/hotel/hotels?guests=2&rooms=1&legacy_provider_id=926482&request_id=642c5800-d5bc-464d-8834-ba4ba5f8514e&q_datetime_utc=2024-04-08T07%3A06%3A50&redirect_delay=1000&appName=goiPhone&appVersion=7.92.1&client_id=skyscanner_app&tm_city_code=NYCA&tm_country_code=US&tm_place_name=New+York&tm_stars=4&ticket_price=1235.0&deeplink_data=eyJmaWVsZHMiOiB7InNpZ25hdHVyZSI6ICJiOWU3ZmQwNjRjMTA5YzUwZDQ5OWZlOTMyOWQ4OTk5ZSJ9LCAidXJsIjogImh0dHBzOi8vd3d3LnN1cGVyLmNvbS90cmF2ZWwvdHJhbnNpdGlvbi8%2FYWx3ZF9yaXNrPWZhbHNlJnJ0cD1TZFRVcWRSN3dJZXc4YVJTOWVzQXR3JTI1M0QlMjUzRCZyaXNrX2xrXzE9ZmFsc2UmYWx3ZF9yYW5kPWZhbHNlJnByb3ZpZGVyX2hvdGVsX2lkPTYyOTUmcHJvdmlkZXI9ZWFuJnByaWNlPTEyMzQuNzImdG90YWxfcHJpY2U9MTU1OC45NCZzdXJjaGFyZ2U9MzA1LjI3JmNoZWNraW5f

Now it gets interesting, as we can put the values into a dataframe (more on dataframes later).

In [13]:
import pandas as pd

lst = []

for values in hotel_labels:
    lst.append(pd.DataFrame.from_dict(values, orient='index').transpose())

In [14]:
df = pd.concat(lst).reset_index()

In [15]:
df

Unnamed: 0,index,partnerName,partnerLogo,partnerId,roomType,roomPolicies,deeplink,rawPrice,rawPriceGbp,price,rateBriefFeatures,isOfficial,isShowHotelName,score
0,0,SuperTravel,https://www.skyscanner.com/images/websites/220...,h_11,Double room,"Meals not included, Non-Refundable",www.skyscanner.com/hotel_deeplink/4.0/US/en-US...,176,140,$176,"[Meals not included, Non-Refundable, Double room]",False,False,34
1,0,Prestigia,https://www.skyscanner.com/images/websites/220...,h_pi,Double room,"Meals not included, Non-Refundable",www.skyscanner.com/hotel_deeplink/4.0/US/en-US...,204,161,$204,"[Meals not included, Non-Refundable, Double room]",False,False,33
2,0,Expedia,https://www.skyscanner.com/images/websites/220...,h_xp,Room,"Meals not included, Non-Refundable",www.skyscanner.com/hotel_deeplink/4.0/US/en-US...,228,180,$228,"[Meals not included, Non-Refundable, Room]",False,False,31
3,0,Hotels.com,https://www.skyscanner.com/images/websites/220...,h_hc,Room,"Meals not included, Non-Refundable",www.skyscanner.com/hotel_deeplink/4.0/US/en-US...,228,180,$228,"[Meals not included, Non-Refundable, Room]",False,False,30
4,0,Travelocity,https://www.skyscanner.com/images/websites/220...,h_tc,Room,"Meals not included, Non-Refundable",www.skyscanner.com/hotel_deeplink/4.0/US/en-US...,228,180,$228,"[Meals not included, Non-Refundable, Room]",False,False,29
5,0,ZenHotels,https://www.skyscanner.com/images/websites/220...,h_zh,Room,"Room Only, Non-Refundable",www.skyscanner.com/hotel_deeplink/4.0/US/en-US...,228,180,$228,"[Room Only, Non-Refundable, Room]",False,False,28
6,0,Priceline,https://www.skyscanner.com/images/websites/220...,h_pr,Room,"Meals not included, Non-Refundable",www.skyscanner.com/hotel_deeplink/4.0/US/en-US...,228,180,$228,"[Meals not included, Non-Refundable, Room]",False,False,27
7,0,Edreams,https://www.skyscanner.com/images/websites/220...,h_ei,Double room,"Meals not included, Non-Refundable",www.skyscanner.com/hotel_deeplink/4.0/US/en-US...,228,180,$228,"[Meals not included, Non-Refundable, Double room]",False,False,26
8,0,Booking.com,https://www.skyscanner.com/images/websites/220...,h_bc,Double room,"Meals not included, Non-Refundable",www.skyscanner.com/hotel_deeplink/4.0/US/en-US...,228,180,$228,"[Meals not included, Non-Refundable, Double room]",False,False,25


### Use packages: Twitter API as an example

Many web servers have their own APIs ready to use. By using these convenient tools, we can get started right off following their documentations and examples without any manual efforts. 

We will be using <a href="https://apps.twitter.com/" target="_blank">Twitter API</a> as an example. We will first install this package as shown [here](https://python-twitter.readthedocs.io/en/latest/installation.html)

You have to register an account for Twitter Developer and register an app. 

- go to https://dev.twitter.com/ and get an app togther. <a href="https://python-twitter.readthedocs.io/en/latest/getting_started.html" target="_b lank">Here</a>'s a quick start on how you can do this. 

After obtaining *__consumer key__*, *__consumer secret__*, *__access token__*, and *__access token secret__*, you are ready to retrieve some data from Twitter!

I have created a .env file in my root folder in the following way:

```
consumer_key = your_consumer_key 
consumer_secret = your_consumer_secret
access_token = your_access_token
access_secret = your_access_secret
```

In [20]:
# Read the keys and tokens into a dictionary
my_dict = {}

with open("../.env", "r") as f:
    for line in f:
        key,val = line.split('=')
        my_dict[key.strip()] = val.strip()

In [21]:
my_dict

{'api_key': '98ec120f-22ae-4c65-bee5-c7979ce58fe2',
 'consumer_key': 'pxckjDmamJ3MFSB4zbfGsf81X',
 'consumer_secret': 'C0c16vwne2UjhcNCW7gv0h2NfqpTDOG4IkYFc8ccpvTyjg6fRv',
 'access_token': '966273572015624192-VUvEJelTQGTYQijFRJhvbPqz6IqVlLK',
 'access_secret': 'GCjsYL7lfj54tedcnznjQMJmJv6WNo86t20h4QPK29f37'}

In [22]:
## load twitter package, which a well-written Python package for Twitter APIs
import twitter
api = twitter.Api(consumer_key=my_dict['consumer_key'],
                  consumer_secret=my_dict['consumer_secret'],                  
                  access_token_key=my_dict['access_token'],
                  access_token_secret=my_dict['access_secret'])

## check status
api.VerifyCredentials()

User(ID=966273572015624192, ScreenName=PetriBagger)

In [23]:
statuses = api.GetUserTimeline(screen_name="PetriBagger")
for s in statuses:
    print(s.text)

@BornsVilkar @trygfonden @sanne_lind @politiken Det er en temmelig frisk konklusion at smide ud i offentligheden ud… https://t.co/HlVxJuvuWg
#kapacity testing Azure logic apps
RT @JosephineFock: Stop den overvågning, helt uacceptabelt at vi fortsat overvåger os alle #dkpol #stoplogning https://t.co/aIFu18Y8hj
Jeg håber flere vil ty til civil ulydighed og boykotte de nationale trivselsmålinger af vores børn. Det er forrykt… https://t.co/GAf4zIJIvF


In [24]:
statuses = api.GetUserTimeline(user_id="966273572015624192")
for s in statuses:
    print(s.created_at)

Tue May 05 18:12:25 +0000 2020
Thu Oct 25 07:24:33 +0000 2018
Sun Mar 18 20:51:56 +0000 2018
Mon Mar 12 21:52:22 +0000 2018


In [25]:
friends = api.GetFriends()
for f in friends:
    print(f.name)

Jakob Sorgenfri Kjær
SpeakerAllan
AllanGravgaardMadsen
Jeppe Søe
Erik Lahm
Nielbo
Astrid Porsmose
Thor Hampus Bank
Kirsten Birgit
Søren Bo Steendahl
Carl Johannes Borris
Morten H.J. Fenger
Martin Ibsen


More interestingly, let's go get some tweets from Twitter. Let's try to search for tweets related to `gpt4` since 01/01/2023 in English.
- See https://dev.twitter.com/rest/public/search for more informaiton on how to construct a query

In [26]:
results = api.GetSearch(
    raw_query="q=gpt4&src=typed_query&since=2023-01-01&lang=en")


In [29]:
len(results)

15

In [31]:
print(results)

[Status(ID=1638131840908419074, ScreenName=JLLEWIS0000, Created=Tue Mar 21 10:54:00 +0000 2023, Text='GM!☕️☀️  Glad to see $GPT4 is 10x now, and still pumping!🦾🦾 \n\n👉https://t.co/Trw8nmEidZ\n\n  #airdrop #NFT $HEX $SUI… https://t.co/vwZ4krb2nR'), Status(ID=1638131839263993856, ScreenName=TrackierHQ, Created=Tue Mar 21 10:54:00 +0000 2023, Text="We tried creating ad copies for our Mobile Marketing Platform using @HubSpot's latest https://t.co/n2SMtwH2xQ. \n\nHe… https://t.co/tqJXvbthwT"), Status(ID=1638131830204542977, ScreenName=MehdiBahramii, Created=Tue Mar 21 10:53:57 +0000 2023, Text='GM!☕️☀️  Glad to see $GPT4 is 10x now, and still pumping!🦾🦾 \n\n👉https://t.co/tXrxXEYZ2C\n\n  #airdrop #NFT $ETH $GPT4… https://t.co/BjwonlFy69'), Status(ID=1638131814060683264, ScreenName=Breezy_Shawtey, Created=Tue Mar 21 10:53:54 +0000 2023, Text='GM!☕️☀️  Glad to see $GPT4 is 10x now, and still pumping!🦾🦾 \n\n👉https://t.co/3C0qXRsRX1\n\n  #airdrop #NFT $BNB $RDNT… https://t.co/WhJ1F4xdnW'), Statu

In [28]:
from IPython.display import clear_output
for tw in results:
    print("%s. Tweeted by \033[41m%s\033[0m"%(tw.text, tw.user.screen_name))

GM!☕️☀️  Glad to see $GPT4 is 10x now, and still pumping!🦾🦾 

👉https://t.co/Trw8nmEidZ

  #airdrop #NFT $HEX $SUI… https://t.co/vwZ4krb2nR. Tweeted by [41mJLLEWIS0000[0m
We tried creating ad copies for our Mobile Marketing Platform using @HubSpot's latest https://t.co/n2SMtwH2xQ. 

He… https://t.co/tqJXvbthwT. Tweeted by [41mTrackierHQ[0m
GM!☕️☀️  Glad to see $GPT4 is 10x now, and still pumping!🦾🦾 

👉https://t.co/tXrxXEYZ2C

  #airdrop #NFT $ETH $GPT4… https://t.co/BjwonlFy69. Tweeted by [41mMehdiBahramii[0m
GM!☕️☀️  Glad to see $GPT4 is 10x now, and still pumping!🦾🦾 

👉https://t.co/3C0qXRsRX1

  #airdrop #NFT $BNB $RDNT… https://t.co/WhJ1F4xdnW. Tweeted by [41mBreezy_Shawtey[0m
RT @inbox1402: $GPT4 is the future of AI and #Crypto 🚀🚀🚀 

👉https://t.co/9lwI4VcpO6

  #airdrop #NFT $CFX #ChatGPT $EVMOS $GPT4 $SSV $ARB $…. Tweeted by [41mDatzRealDae[0m
ChatGPT-assisted diagnosis: Is the future suddenly here? | @statnews  
#AI #ChatGPT #chatgpt4 #GPT4 #HealthTech… https://t.co/JThh

I went through some examples of using APIs to get various types of data in Python. The last Twitter example is relatively superficial. There are other packages to twitter, [`tweepy`] package seems to be pretty popular as well: http://www.tweepy.org/

return to [overview](../00_overview.ipynb)