<a href="https://colab.research.google.com/github/matthewpecsok/data_engineering/blob/main/tutorials/de_api_llm_tutorial.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# APIs and LLMs Tutorial

This notebook is intended to be used as a template notebook for the homework assingment on APIs and LLMs. You may use all of the code provided.

Make sure your runtime is set to a GPU or TPU!

## Import the google places api token

In [1]:
from google.colab import userdata
API_KEY = userdata.get('google_maps_api')

In [2]:
import requests

## Define the business name

this is the name of the business we will identify with the findplace endpoint.

In [111]:
BUSINESS_NAME = 'sweetaly gelato fifteenth and fifteenth utah'

## Use the findplace api endpoint

We need to identify the place_id of the given business. we'll use the findplace endpoint to learn the unique identifier for the business.

Note using common name may return multiple candidate places, we'll assume the first one is the correct business.

We use a get request and the url with the business name and api key

We take the response and extract the json data from it.

In [112]:
find_place_url = f"https://maps.googleapis.com/maps/api/place/findplacefromtext/json?input={BUSINESS_NAME}&inputtype=textquery&key={API_KEY}"
response = requests.get(find_place_url)
json_response = response.json()
json_response

{'candidates': [{'place_id': 'ChIJ41C4NARgUocRPkliM-EwBXw'}], 'status': 'OK'}

get the list of candidates from the dictionary key 'candidates'. if candidates isn't found return an empty list.

In [113]:
candidates = json_response.get('candidates', [])
candidates

[{'place_id': 'ChIJ41C4NARgUocRPkliM-EwBXw'}]

return '' if we cannot find the place id key. assume the 1st element is the one we want.

In [114]:
place_id = candidates[0].get('place_id','')
place_id

'ChIJ41C4NARgUocRPkliM-EwBXw'

ok great, we have a placeid (Google's unique identifier for this business)

Let's use Google's API again to retrieve the reviews for the place.

## Retrieve Business Reviews

We now simplify and and reduce the number of steps for to extract the reviews in a line of code rather than many lines.

In [115]:
details_url = f"https://maps.googleapis.com/maps/api/place/details/json?place_id={place_id}&fields=reviews&key={API_KEY}"
response = requests.get(details_url)
reviews = response.json().get('result', {}).get('reviews', [])

In [116]:
reviews

[{'author_name': 'Jeremy Peters',
  'author_url': 'https://www.google.com/maps/contrib/118110536576238973697/reviews',
  'language': 'en',
  'original_language': 'en',
  'profile_photo_url': 'https://lh3.googleusercontent.com/a/ACg8ocKIf0B_B-qlMBICgTo2WqrKsxW7Dw_WMTOpU2nmrRAvxpsMAA=s128-c0x00000000-cc-rp-mo-ba5',
  'rating': 5,
  'relative_time_description': '4 months ago',
  'text': "Stopped by after a field trip and the kids loved everything! We got Raspberry, Mint Chips, and a Root Beer Float. Cute place, indoor and outdoor seating, large variety of ever changing Gelato flavors. Cozy and quaint or groups, you'll love it all!",
  'time': 1711582958,
  'translated': False},
 {'author_name': 'Patience A',
  'author_url': 'https://www.google.com/maps/contrib/115041665712790239449/reviews',
  'language': 'en',
  'original_language': 'en',
  'profile_photo_url': 'https://lh3.googleusercontent.com/a/ACg8ocKSvXs1hGL8Ygf3-tQ56od8NXXDX6H4ecSpBCi5u_QWdbRn=s128-c0x00000000-cc-rp-mo',
  'rating'

# Hugging Face Setups

This code will import the LLama model. It will take while. It's quite large.

We import the required packages, define the token variable and create a pipeline object. A pipeline defines what the model will do which model to use, what the token is and to use the GPU (device_map) if it's available.

In [4]:
import torch
from transformers import pipeline

HF_TOKEN = userdata.get('HF_TOKEN') # Your token must be in this secret.

pipe = pipeline("text-generation", "meta-llama/Meta-Llama-3-8B-Instruct", torch_dtype=torch.bfloat16, device_map="auto",token=HF_TOKEN)

config.json:   0%|          | 0.00/654 [00:00<?, ?B/s]

model.safetensors.index.json:   0%|          | 0.00/23.9k [00:00<?, ?B/s]

Downloading shards:   0%|          | 0/4 [00:00<?, ?it/s]

model-00001-of-00004.safetensors:   0%|          | 0.00/4.98G [00:00<?, ?B/s]

model-00002-of-00004.safetensors:   0%|          | 0.00/5.00G [00:00<?, ?B/s]

model-00003-of-00004.safetensors:   0%|          | 0.00/4.92G [00:00<?, ?B/s]

model-00004-of-00004.safetensors:   0%|          | 0.00/1.17G [00:00<?, ?B/s]

Loading checkpoint shards:   0%|          | 0/4 [00:00<?, ?it/s]

generation_config.json:   0%|          | 0.00/187 [00:00<?, ?B/s]



tokenizer_config.json:   0%|          | 0.00/51.0k [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/9.09M [00:00<?, ?B/s]

special_tokens_map.json:   0%|          | 0.00/73.0 [00:00<?, ?B/s]

Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.


## Process a single review

For now let's process just one review to get comfortable with the workflow

In [117]:
review = reviews[0]['text']
review

"Stopped by after a field trip and the kids loved everything! We got Raspberry, Mint Chips, and a Root Beer Float. Cute place, indoor and outdoor seating, large variety of ever changing Gelato flavors. Cozy and quaint or groups, you'll love it all!"

## LLM chat setups

create a chat list defining what we want the LLM to accomplish, and what content to be used.

The dictionary with role = system defines the behavior we intend.
The dictionary with role = user is our review text to be evaluated.

In [118]:
chat = [
    {"role": "system", "content": "What specific flavors of gelato were mentioned? Return a list of strings with each flavor an element. If none were mentioned return a list with an element saying 'flavor-missing' only return the list. nothing else."},
    {"role": "user", "content": review}
]

chat

[{'role': 'system',
  'content': "What specific flavors of gelato were mentioned? Return a list of strings with each flavor an element. If none were mentioned return a list with an element saying 'flavor-missing' only return the list. nothing else."},
 {'role': 'user',
  'content': "Stopped by after a field trip and the kids loved everything! We got Raspberry, Mint Chips, and a Root Beer Float. Cute place, indoor and outdoor seating, large variety of ever changing Gelato flavors. Cozy and quaint or groups, you'll love it all!"}]

## Run the pipeline

This will return the generated text acting on the text provided.

The return is a STRING, but it looks like a list. We'll need to convert it.

In [125]:
chat_response = pipe(chat, max_new_tokens=512)
flavor_list_string = chat_response[0]['generated_text'][-1]['content']
print(flavor_list_string)

Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.


['Raspberry', 'Mint Chips', 'Root Beer Float']


In [127]:
type(flavor_list_string)

str

We'll need a new package to convert the string into an actual python list object.

This package literally interprets the string. This literal interpretion results in a list object being created.

In [128]:
import ast
flavor_list = ast.literal_eval(flavor_list_string)

In [129]:
flavor_list

['Raspberry', 'Mint Chips', 'Root Beer Float']

In [130]:
type(flavor_list)

list

## Update the reviews with the chat response

put the list of flavors into the reviews dictionary

For the correct review, place the extract list of flavors mentioned into the review. We pulled the first review, so we can put the list into the first review.

In [132]:
reviews[0]['flavors'] = flavor_list

In [133]:
reviews[0]

{'author_name': 'Jeremy Peters',
 'author_url': 'https://www.google.com/maps/contrib/118110536576238973697/reviews',
 'language': 'en',
 'original_language': 'en',
 'profile_photo_url': 'https://lh3.googleusercontent.com/a/ACg8ocKIf0B_B-qlMBICgTo2WqrKsxW7Dw_WMTOpU2nmrRAvxpsMAA=s128-c0x00000000-cc-rp-mo-ba5',
 'rating': 5,
 'relative_time_description': '4 months ago',
 'text': "Stopped by after a field trip and the kids loved everything! We got Raspberry, Mint Chips, and a Root Beer Float. Cute place, indoor and outdoor seating, large variety of ever changing Gelato flavors. Cozy and quaint or groups, you'll love it all!",
 'time': 1711582958,
 'translated': False,
 'flavors': ['Raspberry', 'Mint Chips', 'Root Beer Float']}

In [134]:
reviews_df = pd.DataFrame(reviews)
reviews_df

Unnamed: 0,author_name,author_url,language,original_language,profile_photo_url,rating,relative_time_description,text,time,translated,flavors
0,Jeremy Peters,https://www.google.com/maps/contrib/1181105365...,en,en,https://lh3.googleusercontent.com/a/ACg8ocKIf0...,5,4 months ago,Stopped by after a field trip and the kids lov...,1711582958,False,"[Raspberry, Mint Chips, Root Beer Float]"
1,Patience A,https://www.google.com/maps/contrib/1150416657...,en,en,https://lh3.googleusercontent.com/a/ACg8ocKSvX...,4,2 months ago,"Delicious pistachio, salted caramel, tiramisu,...",1715125367,False,
2,Tara Busch,https://www.google.com/maps/contrib/1057857283...,en,en,https://lh3.googleusercontent.com/a-/ALV-UjVrX...,3,in the last week,"The gelato was good,my husband got that and I ...",1722276952,False,
3,Jennifer Walton,https://www.google.com/maps/contrib/1156689050...,en,en,https://lh3.googleusercontent.com/a-/ALV-UjVM_...,5,a year ago,I can’t believe I haven’t reviewed this place ...,1662843339,False,
4,Janice Burgeson Conger,https://www.google.com/maps/contrib/1090608575...,en,en,https://lh3.googleusercontent.com/a-/ALV-UjUL1...,5,2 months ago,Gelato 🍧 at its BEST !!! Service impeccable a...,1714949863,False,


## Explode the dataframe

That sounds dangerous. It's not. It simply creates multiple rows for the list object column. So, we get a row for each flavor. This will be the format we need for inserting into the database.

In [135]:
exploded_df = reviews_df.explode('flavors')
exploded_df

Unnamed: 0,author_name,author_url,language,original_language,profile_photo_url,rating,relative_time_description,text,time,translated,flavors
0,Jeremy Peters,https://www.google.com/maps/contrib/1181105365...,en,en,https://lh3.googleusercontent.com/a/ACg8ocKIf0...,5,4 months ago,Stopped by after a field trip and the kids lov...,1711582958,False,Raspberry
0,Jeremy Peters,https://www.google.com/maps/contrib/1181105365...,en,en,https://lh3.googleusercontent.com/a/ACg8ocKIf0...,5,4 months ago,Stopped by after a field trip and the kids lov...,1711582958,False,Mint Chips
0,Jeremy Peters,https://www.google.com/maps/contrib/1181105365...,en,en,https://lh3.googleusercontent.com/a/ACg8ocKIf0...,5,4 months ago,Stopped by after a field trip and the kids lov...,1711582958,False,Root Beer Float
1,Patience A,https://www.google.com/maps/contrib/1150416657...,en,en,https://lh3.googleusercontent.com/a/ACg8ocKSvX...,4,2 months ago,"Delicious pistachio, salted caramel, tiramisu,...",1715125367,False,
2,Tara Busch,https://www.google.com/maps/contrib/1057857283...,en,en,https://lh3.googleusercontent.com/a-/ALV-UjVrX...,3,in the last week,"The gelato was good,my husband got that and I ...",1722276952,False,
3,Jennifer Walton,https://www.google.com/maps/contrib/1156689050...,en,en,https://lh3.googleusercontent.com/a-/ALV-UjVM_...,5,a year ago,I can’t believe I haven’t reviewed this place ...,1662843339,False,
4,Janice Burgeson Conger,https://www.google.com/maps/contrib/1090608575...,en,en,https://lh3.googleusercontent.com/a-/ALV-UjUL1...,5,2 months ago,Gelato 🍧 at its BEST !!! Service impeccable a...,1714949863,False,


In [44]:
from tqdm import tqdm

# How do we loop over all the reviews?

You can use a for loop to iterate over all of the reviews for the business.

In [136]:
for review in tqdm(reviews):
  print('do stuff!')

100%|██████████| 5/5 [00:00<00:00, 30977.13it/s]

do stuff!
do stuff!
do stuff!
do stuff!
do stuff!



