# Parse top 10 rated movies from this URL:

https://learnwithshin.github.io/docs/practices/top_rated_movies/

--------------------------------------
Our end result should look like this:

```python
[{"title": "The Shawshank Redemption", "year": "(1994)", "rating": "9.2"},...]
```

In [1]:
import requests
res = requests.get("https://learnwithshin.github.io/docs/practices/top_rated_movies/")

page_content = res.content.decode("utf-8")

In [4]:
from bs4 import BeautifulSoup

soup = BeautifulSoup(page_content, "html.parser")


In [10]:
# find the article tag
article = soup.find("article")

# once you narrow down to the article tag and dig a bit further, you will be able to find a tag called <table>
# <table> tag represents a table in HTML
table = article.find("table")


In [14]:
# the table tag can have rows represented as <tr> which is under <tbody>
table_body = table.find("tbody")
# you can then find all <tr> using find_all
rows = table_body.find_all("tr")

In [20]:
# create an empty list to store the result
result = []

# since our end result should look like this
# [{"title": "The Shawshank Redemption", "year": "(1994)", "rating": "9.2"},...]

# we will iterate through the rows and collect the data into the result list
for row in rows:
    # remember, each row is still a HTML tag, we can get all 3 datapoints by finding all <td>
    # you can almost think each <td> is like an excel "Cell" that holds the actual data
    cells = row.find_all("td")
    # each <td> tag has attribute "text" 
    title = cells[0].text
    year = cells[1].text
    rating = cells[2].text
    
    # finally just format the record and append to the result list
    result.append({"title": title, "year": year, "rating": rating})

In [24]:
# Now result will give you all the data :)
# for our purpose, we just need to get the top 10 and since we know it's sorted by rating already
# we can simply use
result[:10]  

# there we have our result :)

[{'title': 'The Shawshank Redemption', 'year': '(1994)', 'rating': '9.2'},
 {'title': 'The Godfather', 'year': '(1972)', 'rating': '9.1'},
 {'title': 'The Godfather: Part II', 'year': '(1974)', 'rating': '9.0'},
 {'title': 'The Dark Knight', 'year': '(2008)', 'rating': '9.0'},
 {'title': '12 Angry Men', 'year': '(1957)', 'rating': '8.9'},
 {'title': "Schindler's List", 'year': '(1993)', 'rating': '8.9'},
 {'title': 'The Lord of the Rings: The Return of the King',
  'year': '(2003)',
  'rating': '8.9'},
 {'title': 'Pulp Fiction', 'year': '(1994)', 'rating': '8.8'},
 {'title': 'The Good, the Bad and the Ugly',
  'year': '(1966)',
  'rating': '8.8'},
 {'title': 'The Lord of the Rings: The Fellowship of the Ring',
  'year': '(2001)',
  'rating': '8.8'}]

## Bonus

We can convert this to a pandas DataFrame which can help us process the data if needed!

In [30]:
import pandas as pd

# this works becasue we have our result formated in such a way it can be digested by pandans DataFrame
movies = pd.DataFrame(result)

In [28]:
top_10_movies = movies.head(10)

In [29]:
top_10_movies  # in a nicely formated DataFrame

Unnamed: 0,title,year,rating
0,The Shawshank Redemption,(1994),9.2
1,The Godfather,(1972),9.1
2,The Godfather: Part II,(1974),9.0
3,The Dark Knight,(2008),9.0
4,12 Angry Men,(1957),8.9
5,Schindler's List,(1993),8.9
6,The Lord of the Rings: The Return of the King,(2003),8.9
7,Pulp Fiction,(1994),8.8
8,"The Good, the Bad and the Ugly",(1966),8.8
9,The Lord of the Rings: The Fellowship of the Ring,(2001),8.8


In [32]:
# and of course now you can save the file as csv if you want to
movies.to_csv("movies_with_ratings.csv")