<a href="https://colab.research.google.com/github/lucasaxm/desafio_ia_alura_google/blob/main/Desafio_Imers%C3%A3o_IA_Alura_%2B_Google.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

Esse notebook irá buscar pelo user definido abaixo no site letterboxd e com base nos seus reviews irá gerar 5 recomendações de filmes no idioma definido abaixo.

In [91]:
letterboxd_username = 'brendaped'
language = 'Brazilian Portuguese'
sample_size = 15

Letterboxd scraper baseado nesse [artigo](https://medium.com/@alf.19x/letterboxd-friends-ranker-simple-movie-recommendation-system-80a38dcfb0da).

In [53]:
from bs4 import BeautifulSoup
import requests
import pandas as pd
import textwrap
from IPython.display import display
from IPython.display import Markdown

def to_markdown(text):
  text = text.replace('•', '  *')
  return Markdown(textwrap.indent(text, '> ', predicate=lambda _: True))

def get_random_rows(df, n):
  """Returns n random rows from a DataFrame, or the whole DataFrame if it has fewer than n rows.

  Args:
    df: The DataFrame to sample from.
    n: The number of rows to sample.

  Returns:
    A DataFrame containing n random rows from df, or the whole df if it has fewer than n rows.
  """

  return df.sample(n=min(n, len(df)))

class LetterboxdScraper:
    def __init__(self):
        self.DOMAIN = "https://letterboxd.com"

    def transform_ratings(self, some_str):
        """
        Transforms raw star rating into float value.
        :param: some_str: actual star rating
        :rtype: returns the float representation of the given star(s)
        """
        stars = {
            "★": 1,
            "★★": 2,
            "★★★": 3,
            "★★★★": 4,
            "★★★★★": 5,
            "½": 0.5,
            "★½": 1.5,
            "★★½": 2.5,
            "★★★½": 3.5,
            "★★★★½": 4.5
        }
        try:
            return stars[some_str]
        except:
            return -1

    def scrape_films(self, username):
        """
        Scrapes film data for the given username.
        :param: username: Letterboxd username to scrape
        :rtype: pandas DataFrame containing film data
        """
        movies_dict = {}
        movies_dict['id'] = []
        movies_dict['title'] = []
        movies_dict['rating'] = []
        movies_dict['liked'] = []
        movies_dict['link'] = []
        url = self.DOMAIN + "/" + username + "/films/"
        url_page = requests.get(url)
        soup = BeautifulSoup(url_page.content, 'html.parser')

        # check number of pages
        li_pagination = soup.findAll("li", {"class": "paginate-page"})
        if len(li_pagination) == 0:
            ul = soup.find("ul", {"class": "poster-list"})
            if (ul != None):
                movies = ul.find_all("li")
                for movie in movies:
                    movies_dict['id'].append(movie.find('div')['data-film-id'])
                    movies_dict['title'].append(movie.find('img')['alt'])
                    movies_dict['rating'].append(self.transform_ratings(movie.find('p', {"class": "poster-viewingdata"}).get_text().strip()))
                    movies_dict['liked'].append(movie.find('span', {'class': 'like'})!=None)
                    movies_dict['link'].append(movie.find('div')['data-target-link'])
        else:
            for i in range(int(li_pagination[-1].find('a').get_text().strip())):
                url = self.DOMAIN + "/" + username + "/films/page/" + str(i+1)
                url_page = requests.get(url)
                soup = BeautifulSoup(url_page.content, 'html.parser')
                ul = soup.find("ul", {"class": "poster-list"})
                if (ul != None):
                    movies = ul.find_all("li")
                    for movie in movies:
                        movies_dict['id'].append(movie.find('div')['data-film-id'])
                        movies_dict['title'].append(movie.find('img')['alt'])
                        movies_dict['rating'].append(self.transform_ratings(movie.find('p', {"class": "poster-viewingdata"}).get_text().strip()))
                        movies_dict['liked'].append(movie.find('span', {'class': 'like'})!=None)
                        movies_dict['link'].append(movie.find('div')['data-target-link'])

        df_film = pd.DataFrame(movies_dict)
        return df_film

Configuração do Gemini e prompt usado para gerar as recomendações.

In [90]:
import google.generativeai as genai
# Used to securely store your API key
from google.colab import userdata

GOOGLE_API_KEY=userdata.get('GOOGLE_API_KEY')
genai.configure(api_key=GOOGLE_API_KEY)

generation_config = {
  "candidate_count": 1,
  "temperature": 0.5,
}

safety_settings={
  'HATE': 'BLOCK_NONE',
  'HARASSMENT': 'BLOCK_NONE',
  'SEXUAL' : 'BLOCK_NONE',
  'DANGEROUS' : 'BLOCK_NONE'
}

model = genai.GenerativeModel(model_name = "gemini-1.0-pro-latest",
                              generation_config=generation_config,
                              safety_settings=safety_settings)
def generate_prompt(df, language):
  return ("Based on the film ratings table below, recommend 5 movies I might enjoy that are not already listed in the table.\n"
          "IMPORTANT: 0 is the lowest rating and 5 is the highest rating, don't recommend movies similar to those that are scored below 2.5.\n"
          "For each recommendation, provide a brief sinopsys and explain why you think I'll enjoy it based on my past ratings. "
          f"Your response should be formated as markdown cards with the Title in {language}, and below that the description on why you recommend this movie.\n"
          f"{df[['title','rating']].to_csv(index=False)}")

In [92]:
scraper = LetterboxdScraper()
df_letterboxd = scraper.scrape_films(letterboxd_username)
df_letterboxd_sample = df_letterboxd.sample(n=min(sample_size, len(df_letterboxd)))
df_letterboxd_sample

Unnamed: 0,id,title,rating,liked,link
17,37833,Scott Pilgrim vs. the World,5.0,False,/film/scott-pilgrim-vs-the-world/
10,371378,Dune,5.0,False,/film/dune-2021/
2,619510,The Hunger Games: The Ballad of Songbirds & Sn...,3.0,False,/film/the-hunger-games-the-ballad-of-songbirds...
3,710352,Poor Things,4.0,False,/film/poor-things-2023/
0,834656,Civil War,4.0,False,/film/civil-war-2024/
9,474474,Everything Everywhere All at Once,4.0,False,/film/everything-everywhere-all-at-once/
15,257426,Arrival,5.0,False,/film/arrival-2016/
13,251943,Spider-Man: Into the Spider-Verse,5.0,False,/film/spider-man-into-the-spider-verse/
12,426406,Parasite,5.0,False,/film/parasite-2019/
8,682547,Nope,5.0,False,/film/nope/


In [93]:
prompt = generate_prompt(df_letterboxd_sample, language)
print(prompt)

Based on the film ratings table below, recommend 5 movies I might enjoy that are not already listed in the table.
IMPORTANT: 0 is the lowest rating and 5 is the highest rating, don't recommend movies similar to those that are scored below 2.5.
For each recommendation, provide a brief sinopsys and explain why you think I'll enjoy it based on my past ratings. Your response should be formated as markdown cards with the Title in Brazilian Portuguese, and below that the description on why you recommend this movie.
title,rating
Scott Pilgrim vs. the World,5.0
Dune,5.0
The Hunger Games: The Ballad of Songbirds & Snakes,3.0
Poor Things,4.0
Civil War,4.0
Everything Everywhere All at Once,4.0
Arrival,5.0
Spider-Man: Into the Spider-Verse,5.0
Parasite,5.0
Nope,5.0
Midsommar,1.0
Spider-Man: Across the Spider-Verse,5.0
Oppenheimer,4.0
Spirited Away,5.0
About Time,5.0



Execução do prompt e display em markdown



In [94]:
response = model.generate_content(prompt)

display(to_markdown(response.text))

> ### **Cidade de Deus**
> 
> Um retrato cru e comovente da vida no subúrbio do Rio de Janeiro, onde a violência e a pobreza são a norma. Você apreciará a representação realista e o enredo envolvente, semelhantes a "Parasite" e "Cidade dos Sonhos".
> 
> ### **O Tigre e o Dragão**
> 
> Uma épica de artes marciais visualmente deslumbrante que segue a jornada de uma guerreira aposentada que deve recuperar sua espada roubada. Você ficará fascinado pela coreografia de luta deslumbrante e pela história envolvente, que lembra "Dune" e "Spider-Man: Into the Spider-Verse".
> 
> ### **Amélie Poulain**
> 
> Uma comédia romântica peculiar e encantadora que segue a vida de uma jovem peculiar em Paris. Você vai adorar o humor excêntrico e o estilo visual único, semelhantes a "About Time" e "Everything Everywhere All at Once".
> 
> ### **O Fabuloso Destino de Amélie Poulain**
> 
> Uma fábula moderna comovente e inspiradora que segue a jornada de um homem que decide mudar sua vida para melhor. Você se identificará com a mensagem de esperança e otimismo, semelhantes a "Arrival" e "Spirited Away".
> 
> ### **O Resgate do Soldado Ryan**
> 
> Um drama de guerra épico e comovente que retrata o desembarque do Dia D na Normandia. Você apreciará a ação intensa e o retrato realista da guerra, semelhantes a "Civil War" e "Oppenheimer".