# Web Scraping Lab

You will find in this notebook some scrapy exercises to practise your scraping skills.

**Tips:**

- Check the response status code for each request to ensure you have obtained the intended content.
- Print the response text in each request to understand the kind of info you are getting and its format.
- Check for patterns in the response text to extract the data/info requested in each question.
- Visit the urls below and take a look at their source code through Chrome DevTools. You'll need to identify the html tags, special class names, etc used in the html content you are expected to extract.

**Resources**:
- [Requests library](http://docs.python-requests.org/en/master/#the-user-guide)
- [Beautiful Soup Doc](https://www.crummy.com/software/BeautifulSoup/bs4/doc/)
- [Urllib](https://docs.python.org/3/library/urllib.html#module-urllib)
- [re lib](https://docs.python.org/3/library/re.html)
- [lxml lib](https://lxml.de/)
- [Scrapy](https://scrapy.org/)
- [List of HTTP status codes](https://en.wikipedia.org/wiki/List_of_HTTP_status_codes)
- [HTML basics](http://www.simplehtmlguide.com/cheatsheet.php)
- [CSS basics](https://www.cssbasics.com/#page_start)

#### Below are the libraries and modules you may need. `requests`,  `BeautifulSoup` and `pandas` are already imported for you. If you prefer to use additional libraries feel free to do it.

In [2]:
import requests
from bs4 import BeautifulSoup
import pandas as pd

#### Download, parse (using BeautifulSoup), and print the content from the Trending Developers page from GitHub:

In [2]:
# This is the url you will scrape in this exercise
url = 'https://github.com/trending/developers'
html = requests.get(url).content
code_source = BeautifulSoup(html, "html.parser")

#### Display the names of the trending developers retrieved in the previous step.

Your output should be a Python list of developer names. Each name should not contain any html tag.

**Instructions:**

1. Find out the html tag and class names used for the developer names. You can achieve this using Chrome DevTools.

1. Use BeautifulSoup to extract all the html elements that contain the developer names.

1. Use string manipulation techniques to replace whitespaces and linebreaks (i.e. `\n`) in the *text* of each html element. Use a list to store the clean names.

1. Print the list of names.

Your output should look like below:

```
['trimstray (@trimstray)',
 'joewalnes (JoeWalnes)',
 'charlax (Charles-AxelDein)',
 'ForrestKnight (ForrestKnight)',
 'revery-ui (revery-ui)',
 'alibaba (Alibaba)',
 'Microsoft (Microsoft)',
 'github (GitHub)',
 'facebook (Facebook)',
 'boazsegev (Bo)',
 'google (Google)',
 'cloudfetch',
 'sindresorhus (SindreSorhus)',
 'tensorflow',
 'apache (TheApacheSoftwareFoundation)',
 'DevonCrawford (DevonCrawford)',
 'ARMmbed (ArmMbed)',
 'vuejs (vuejs)',
 'fastai (fast.ai)',
 'QiShaoXuan (Qi)',
 'joelparkerhenderson (JoelParkerHenderson)',
 'torvalds (LinusTorvalds)',
 'CyC2018',
 'komeiji-satori (神楽坂覚々)',
 'script-8']
 ```

In [3]:
developer_names = code_source.find_all("div", {"class":"position-relative container-lg p-responsive pt-6"})

#boucle

#name = developer_names_soup.a.string

#for element in developer_names_soup:
#    print(developer_names_soup.a.string)


In [4]:
articles = developer_names[0].find_all("article")
articles[0].h1.a.string

liste = [str(art.h1.a.string).replace(' ','').replace('\n','') for art in articles if str(art.h1.a.string) != "None"]
liste


['FrancoisZaninotto',
 'FonsvanderPlas',
 'RichHarris',
 'StefanoGottardo',
 'JesseDuffield',
 'HaThach',
 'franciscosouza',
 'Barryvd.Heuvel',
 'JonathanReinink',
 'BradFitzpatrick',
 'SebastiánRamírez',
 'DavidTolnay',
 'MladenMacanović',
 'GlebBahmutov',
 'ArtemZakharchenko',
 'MikeMcQuaid',
 'OmryYadan',
 'MariuszNowak',
 'EricLiu',
 'MikePenz',
 'SteveSmith',
 'MartenSeemann',
 'JoshBleecherSnyder',
 'JacobQuinn',
 'HadleyWickham']

#### Display the trending Python repositories in GitHub.

The steps to solve this problem is similar to the previous one except that you need to find out the repository names instead of developer names.

In [5]:
# This is the url you will scrape in this exercise
url2 = 'https://github.com/trending/python?since=daily'

In [6]:
html2 = requests.get(url2).content
code_source_repo = BeautifulSoup(html2, "html.parser")

In [7]:
repo_names = code_source_repo.find_all("div", {"class":"position-relative container-lg p-responsive pt-6"})

In [8]:
articles_repo = repo_names[0].find_all("article")
articles_repo[0].h1.a.span.string

[str(art.h1.a.span.string).replace(' ','')
 .replace('\n','').replace('/','')
 for art in articles_repo]

['archlinux',
 'ytdl-org',
 'facebookresearch',
 'Rapptz',
 'gto76',
 'philippnormann',
 'fudan-zvg',
 'hpyproject',
 'beurtschipper',
 'projectdiscovery',
 'python-telegram-bot',
 'matplotlib',
 'frappe',
 'fireeye',
 'CastagnaIT',
 'Hari-Nagarajan',
 '3b1b',
 'lukemelas',
 'PostHog',
 'huggingface',
 'open-mmlab',
 'pytorch',
 'TheAlgorithms',
 'JunMa11',
 'ManimCommunity']

#### Display all the image links from Walt Disney wikipedia page.

In [9]:
# This is the url you will scrape in this exercise
url3 = 'https://en.wikipedia.org/wiki/Walt_Disney'

In [10]:
html3 = requests.get(url3).content
code_source_image = BeautifulSoup(html3, "html")

In [11]:
images = code_source_image.find_all("img")
images

link_images = [str(images).split("src") for image in images]


In [12]:
images = code_source_image.find_all("img")
images

link_images2 = [link.get("src") for link in images]

#### Retrieve an arbitary Wikipedia page of "Python" and create a list of links on that page.

In [13]:
# This is the url you will scrape in this exercise
url4 ='https://en.wikipedia.org/wiki/Python' 

In [14]:
html4 = requests.get(url4).content


In [15]:
soup4 = BeautifulSoup(html4, "html.parser")
soup4_find = soup4.find_all("a")

link_list = [link.get("href") for link in soup4_find]
link_list

[None,
 '#mw-head',
 '#searchInput',
 'https://en.wiktionary.org/wiki/Python',
 'https://en.wiktionary.org/wiki/python',
 '/wiki/Pythons',
 '/wiki/Python_(genus)',
 '#Computing',
 '#People',
 '#Roller_coasters',
 '#Vehicles',
 '#Weaponry',
 '#Other_uses',
 '#See_also',
 '/w/index.php?title=Python&action=edit&section=1',
 '/wiki/Python_(programming_language)',
 '/wiki/CMU_Common_Lisp',
 '/wiki/PERQ#PERQ_3',
 '/w/index.php?title=Python&action=edit&section=2',
 '/wiki/Python_of_Aenus',
 '/wiki/Python_(painter)',
 '/wiki/Python_of_Byzantium',
 '/wiki/Python_of_Catana',
 '/wiki/Python_Anghelo',
 '/w/index.php?title=Python&action=edit&section=3',
 '/wiki/Python_(Efteling)',
 '/wiki/Python_(Busch_Gardens_Tampa_Bay)',
 '/wiki/Python_(Coney_Island,_Cincinnati,_Ohio)',
 '/w/index.php?title=Python&action=edit&section=4',
 '/wiki/Python_(automobile_maker)',
 '/wiki/Python_(Ford_prototype)',
 '/w/index.php?title=Python&action=edit&section=5',
 '/wiki/Python_(missile)',
 '/wiki/Python_(nuclear_prima

#### Find the number of titles that have changed in the United States Code since its last release point.

In [16]:
# This is the url you will scrape in this exercise
url5 = 'http://uscode.house.gov/download/download.shtml'

In [17]:
html5 = requests.get(url5).content

In [18]:
soup5 = BeautifulSoup(html5, "html.parser")
soup5_find = soup5.find_all("div",{"class":"usctitlechanged"})

link_titles = [str(soup5_find) for title in soup5_find]
len(link_titles)

4

#### Find a Python list with the top ten FBI's Most Wanted names.

In [19]:
# This is the url you will scrape in this exercise
url6 = 'https://www.fbi.gov/wanted/topten'

In [20]:
html6 = requests.get(url6).content

In [21]:
import re
soup6 = BeautifulSoup(html6, "html.parser")
soup6_find = soup6.find_all("div",{"class":"movable removable mosaic-tile mosaic-castle.cms.querylisting-tile"})[0]

names_fugitives = [str(soup6_find.text).strip().split("\n") for name in soup6_find]


####  Display the 20 latest earthquakes info (date, time, latitude, longitude and region name) by the EMSC as a pandas dataframe.

In [4]:
# This is the url you will scrape in this exercise
url7 = 'https://www.emsc-csem.org/Earthquake/'

In [25]:
html7 = requests.get(url7).content

soup7 = BeautifulSoup(html7, "html.parser")
soup7_find = soup7.find_all("table")[3]


rows = soup7_find("tr")
rows = [row.text.split() for row in rows]

colnames = ('response', 'date&time', 'latitude', 'longitude', 'km', 'mag', 'region name', '0', '0', '0', '0', '0', '0', '0')
data = rows[5:]

df_soup7_find = pd.DataFrame(data, columns = colnames)
df_soup7_find

#for d in soup7_find:
 #   response = soup7.find_all("td", {"class":"tablev6"})
  #  print(response)
    
    
"""res = []
for book in soup.find_all('li', {'class': 'col-xs-6 col-sm-4 col-md-3 col-lg-3'}):
    title = book.h3.a['title']
    div_prodprice = book.find('div', {'class': 'product_price'})
    price = div_prodprice.find_all('p')[0].text
    stock = div_prodprice.find_all('p')[1].text.strip()
    res.append([title, price, stock])
res_df = pd.DataFrame(res)"""

df_soup7_find

Unnamed: 0,response,date&time,latitude,longitude,km,mag,region name,0,0.1,0.2,0.3,0.4,0.5,0.6
0,earthquake2021-04-03,12:07:53.023min,ago25.78,S,70.44,W,50ML3.0,"ANTOFAGASTA,",CHILE2021-04-03,12:25,,,,
1,earthquake2021-04-03,11:56:50.034min,ago8.94,S,122.43,E,103,M4.1,FLORES,"REGION,",INDONESIA2021-04-03,12:05,,
2,earthquake2021-04-03,11:48:29.042min,ago9.78,S,120.66,E,14,M3.3,SUMBA,"REGION,",INDONESIA2021-04-03,11:55,,
3,earthquake2021-04-03,11:45:27.745min,ago43.05,N,2.01,W,10ML1.6,SPAIN2021-04-03,11:51,,,,,
4,earthquake2021-04-03,11:41:39.049min,ago38.48,N,15.32,E,158ML2.2,"SICILY,",ITALY2021-04-03,11:59,,,,
5,earthquake2021-04-03,11:31:30.059min,ago39.59,N,38.17,E,13ML2.0,EASTERN,TURKEY2021-04-03,12:09,,,,
6,earthquake2021-04-03,11:30:28.01hr,00min,ago32.47,S,71.67,W,61ML3.5,OFFSHORE,"VALPARAISO,",CHILE2021-04-03,11:52,,
7,earthquake2021-04-03,11:27:01.01hr,03min,ago0.07,N,121.93,E,10,M2.6,"MINAHASA,","SULAWESI,",INDONESIA2021-04-03,11:55,
8,earthquake2021-04-03,11:23:38.91hr,07min,ago27.89,N,16.32,W,1ML1.6,CANARY,"ISLANDS,",SPAIN,REGION2021-04-03,11:40,
9,earthquake2021-04-03,11:11:29.71hr,19min,ago57.85,S,7.62,W,10mb5.0,EAST,OF,SOUTH,SANDWICH,ISLANDS2021-04-03,11:35


In [5]:
html7 = requests.get(url7).content;
soup = BeautifulSoup(html7, "lxml");
earthquakes = soup.find('tbody', {'id': 'tbody'}).find_all("tr");

nelem = 20;
latest_earthquakes = [];
    
for earthquake in earthquakes[:nelem]:
    # Date and time
    date, time = earthquake.find('td', {'class': 'tabev6'}).find('a').text.split();
    # Latitude and longitude (on peut mettre plusieurs variables avec même find_all)
    lat_deg, lon_deg = earthquake.find_all('td', {'class': 'tabev1'});
    lat_dir, lon_dir, magnitude = earthquake.find_all('td', {'class': 'tabev2'});
    lat_deg = f"{lat_deg.text.strip()} {lat_dir.text.strip()}";
    lon_deg = f"{lon_deg.text.strip()} {lon_dir.text.strip()}";
    # Region
    region = earthquake.find('td', {'class': 'tb_region'}).text.strip();
    # Create list of information and append
    earthquake_summary = [date, time, lat_deg , lon_deg, region];
    latest_earthquakes.append(earthquake_summary);
    
df = pd.DataFrame(latest_earthquakes, columns=['Date', 'Time', 'Latitude', 'Longitude', 'Region']);
df

Unnamed: 0,Date,Time,Latitude,Longitude,Region
0,2021-04-07,18:20:03.8,35.09 N,116.97 W,SOUTHERN CALIFORNIA
1,2021-04-07,18:10:30.5,38.14 N,117.85 W,NEVADA
2,2021-04-07,18:00:44.9,38.80 N,15.71 E,"SICILY, ITALY"
3,2021-04-07,17:52:36.7,28.16 N,15.08 W,"CANARY ISLANDS, SPAIN REGION"
4,2021-04-07,17:51:30.0,9.33 N,83.77 W,COSTA RICA
5,2021-04-07,17:42:45.0,4.14 N,96.28 E,"NORTHERN SUMATRA, INDONESIA"
6,2021-04-07,17:30:23.0,0.52 N,126.63 E,MOLUCCA SEA
7,2021-04-07,17:29:07.0,19.19 N,155.48 W,"ISLAND OF HAWAII, HAWAII"
8,2021-04-07,17:25:17.0,1.33 S,120.55 E,"SULAWESI, INDONESIA"
9,2021-04-07,17:14:29.1,34.96 S,179.05 E,SOUTH OF KERMADEC ISLANDS


#### Count the number of tweets by a given Twitter account.

Pour l'exo web scraping Twitter : Twitter a rajouté en 2020 une protection contre le scraping en forçant l'utilisation du JavaScript. Aussi, la technique utilisant un requests.get ne fonctionne plus.

Ask the user for the handle (@handle) of a twitter account. You will need to include a ***try/except block*** for account names not found. 
<br>***Hint:*** the program should count the number of tweets for any provided account.

In [7]:
# This is the url you will scrape in this exercise 
# You will need to add the account credentials to this url
url8 = 'https://twitter.com/'

In [8]:
username = input('Please, input your username: ')
html = requests.get(url8 + username).content;
soup = BeautifulSoup(html, "lxml");

try:
    tweet_box = soup.find('li', {'class':'ProfileNav-item ProfileNav-item--tweets is-active'});
    tweets = tweet_box.find('a').find('span', {'class':'ProfileNav-value'});
    print("{} has {} number of tweets.".format(username, tweets.get('data-count')))
except:
    print('Account name not found...')


Please, input your username:  incautiouswifi


Account name not found...


#### Number of followers of a given twitter account
Ask the user for the handle (@handle) of a twitter account. You will need to include a ***try/except block*** for account names not found. 
<br>***Hint:*** the program should count the followers for any provided account.

In [189]:
# This is the url you will scrape in this exercise 
# You will need to add the account credentials to this url
url9 = 'https://twitter.com/EmmanuelMacron'

In [198]:
html9 = requests.get(url9).content

soup9 = BeautifulSoup(html7, "html.parser")
soup9_find = soup9.find_all("body")


#### List all language names and number of related articles in the order they appear in wikipedia.org.

In [9]:
# This is the url you will scrape in this exercise
url10 = 'https://www.wikipedia.org/'

In [16]:
html10 = requests.get(url10).content

soup10 = BeautifulSoup(html10, "html.parser")
soup10_find = soup10.find_all('a', {'class': 'link-box'})

language_list = ["".join(str(language.text.split())) for language in soup10_find]


language_list

#language_number = {language:v for language,v in soup10_find.items}


["['English', '6', '274', '000+', 'articles']",
 "['Español', '1', '668', '000+', 'artículos']",
 "['日本語', '1', '259', '000+', '記事']",
 "['Deutsch', '2', '553', '000+', 'Artikel']",
 "['Русский', '1', '708', '000+', 'статей']",
 "['Français', '2', '311', '000+', 'articles']",
 "['Italiano', '1', '681', '000+', 'voci']",
 "['中文', '1', '185', '000+', '條目']",
 "['Português', '1', '061', '000+', 'artigos']",
 "['Polski', '1', '463', '000+', 'haseł']"]

#### A list with the different kind of datasets available in data.gov.uk.

In [53]:
# This is the url you will scrape in this exercise
url11 = 'https://data.gov.uk/'

In [71]:
html11 = requests.get(url11).content

soup11 = BeautifulSoup(html11, "html")

soup11_find = soup11.find_all("h3",{"class":"govuk-heading-s dgu-topics__heading"})

topics_list = [str(topic.text) for topic in soup11_find]
topics_list

['Business and economy',
 'Crime and justice',
 'Defence',
 'Education',
 'Environment',
 'Government',
 'Government spending',
 'Health',
 'Mapping',
 'Society',
 'Towns and cities',
 'Transport',
 'Digital service performance',
 'Government reference data']

#### Display the top 10 languages by number of native speakers stored in a pandas dataframe.

In [173]:
# This is the url you will scrape in this exercise
url12 = 'https://en.wikipedia.org/wiki/List_of_languages_by_number_of_native_speakers'

In [177]:
html12 = requests.get(url12).content

soup12 = BeautifulSoup(html12, "html")

soup12_find = soup12.find_all("table")

rows = soup12.find_all('tr')
rows = [row.text.strip().split("\n") for row in rows]

colnames = ["Rank", "Language", "Native speakers", "Percentageof worldpopulation"]

data = rows[95:195]


df_languages = pd.DataFrame(data, columns=colnames)
df_languages.head(10)



Unnamed: 0,Rank,Language,Native speakers,Percentageof worldpopulation
0,1,Mandarin (entire branch),935 (955),14.1%
1,2,Spanish,390 (405),5.85%
2,3,English,365 (360),5.52%
3,4,Hindi[a],295 (310),4.46%
4,5,Arabic,280 (295),4.23%
5,6,Portuguese,205 (215),3.08%
6,7,Bengali,200 (205),3.05%
7,8,Russian,160 (155),2.42%
8,9,Japanese,125 (125),1.92%
9,10,Punjabi,95 (100),1.44%


## Bonus
#### Scrape a certain number of tweets of a given Twitter account.

In [None]:
# This is the url you will scrape in this exercise 
# You will need to add the account credentials to this url
url = 'https://twitter.com/'

#### Display IMDB's top 250 data (movie name, initial release, director name and stars) as a pandas dataframe.

In [59]:
# This is the url you will scrape in this exercise 
url13 = 'https://www.imdb.com/chart/top'

In [22]:
html13 = requests.get(url13).content
soup13 = BeautifulSoup(html13, "html")

soup13_find = soup13.find_all("td", {"class":"titleColumn"})

rows = [[row.a.text]+[row.span.text]+[row.a['title']] for row in soup13_find]

colnames = ["movie name", "initial release", "director name and stars"]

data = rows

df_languages = pd.DataFrame(data, columns=colnames)
df_languages.head(10)

Unnamed: 0,movie name,initial release,director name and stars
0,Les Évadés,(1994),"Frank Darabont (dir.), Tim Robbins, Morgan Fre..."
1,Le parrain,(1972),"Francis Ford Coppola (dir.), Marlon Brando, Al..."
2,"Le parrain, 2ème partie",(1974),"Francis Ford Coppola (dir.), Al Pacino, Robert..."
3,The Dark Knight : Le Chevalier noir,(2008),"Christopher Nolan (dir.), Christian Bale, Heat..."
4,12 hommes en colère,(1957),"Sidney Lumet (dir.), Henry Fonda, Lee J. Cobb"
5,La liste de Schindler,(1993),"Steven Spielberg (dir.), Liam Neeson, Ralph Fi..."
6,Le Seigneur des anneaux : Le Retour du roi,(2003),"Peter Jackson (dir.), Elijah Wood, Viggo Morte..."
7,Pulp Fiction,(1994),"Quentin Tarantino (dir.), John Travolta, Uma T..."
8,"Le Bon, la brute, le truand",(1966),"Sergio Leone (dir.), Clint Eastwood, Eli Wallach"
9,Le Seigneur des anneaux : La Communauté de l'a...,(2001),"Peter Jackson (dir.), Elijah Wood, Ian McKellen"


#### Display the movie name, year and a brief summary of the top 10 random movies (IMDB) as a pandas dataframe.

In [18]:
#This is the url you will scrape in this exercise
url13 = 'http://www.imdb.com/chart/top'

In [23]:
url_film = [row.a.get('href') for row in soup13_find]

In [20]:
from random import shuffle;

n_random = 10;

html = requests.get(url13).content;
soup = BeautifulSoup(html, "lxml");
movies = soup.find_all('td', {'class':'titleColumn'})

shuffle(movies)

titles = [movie.find('a').text for movie in movies[0:n_random]]
years = [movie.find('span').text[1:-1] for movie in movies[0:n_random]]
links_to_movies = [movie.find('a').get('href') for movie in movies[0:n_random]]

summary = []
for link in links_to_movies:
    html = requests.get('https://www.imdb.com' + link).content;
    soup = BeautifulSoup(html, "lxml");
    summary.append(soup.find('div', {'class':'summary_text'}).text.strip());

movies_dict = {'Title': titles, 'Release': years, 'Summary': summary}

movies_df = pd.DataFrame(movies_dict)
movies_df

Unnamed: 0,Title,Release,Summary
0,Gran Torino,2008,Disgruntled Korean War veteran Walt Kowalski s...
1,Apocalypse Now,1979,A U.S. Army officer serving in Vietnam is task...
2,Le Fabuleux Destin d'Amélie Poulain,2001,Amélie is an innocent and naive girl in Paris ...
3,Avengers: Infinity War,2018,The Avengers and their allies must be willing ...
4,La mort aux trousses,1959,A New York City advertising executive goes on ...
5,Voyage à Tokyo,1953,An old couple visit their children and grandch...
6,Reservoir Dogs,1992,When a simple jewelry heist goes horribly wron...
7,1917,2019,"April 6th, 1917. As a regiment assembles to wa..."
8,Le Silence des agneaux,1991,A young F.B.I. cadet must receive the help of ...
9,Stalker,1979,A guide leads two men through an area known as...


#### Find the live weather report (temperature, wind speed, description and weather) of a given city.

In [4]:
#https://openweathermap.org/current
city = input('Enter the city: ')
url50 = 'http://api.openweathermap.org/data/2.5/weather?'+'q='+city+'&APPID=b35975e18dc93725acb092f7272cc6b8&units=metric'

Enter the city:  Orléans


In [8]:
import json 
response = requests.get(url50)
results = response.json()
results

{'coord': {'lon': 1.9039, 'lat': 47.9029},
 'weather': [{'id': 801,
   'main': 'Clouds',
   'description': 'few clouds',
   'icon': '02d'}],
 'base': 'stations',
 'main': {'temp': 11.98,
  'feels_like': 10.36,
  'temp_min': 11,
  'temp_max': 13.33,
  'pressure': 1024,
  'humidity': 43},
 'visibility': 10000,
 'wind': {'speed': 8.75, 'deg': 30},
 'clouds': {'all': 20},
 'dt': 1617460620,
 'sys': {'type': 1,
  'id': 6534,
  'country': 'FR',
  'sunrise': 1617427588,
  'sunset': 1617474265},
 'timezone': 7200,
 'id': 2989317,
 'name': 'Orléans',
 'cod': 200}

#### Find the book name, price and stock availability as a pandas dataframe.

In [54]:
# This is the url you will scrape in this exercise. 
# It is a fictional bookstore created to be scraped. 
url51 = 'http://books.toscrape.com/'
html51 = requests.get(url51).content
soup51 = BeautifulSoup(html51, "html")

soup51_find = soup51.find_all("article",{"class","product_pod"})

div_prodprice = soup51.find('div', {'class': 'product_price'})

price = div_prodprice.find_all('p')[0].text

stock = div_prodprice.find_all('p')[1].text.strip()

rows5 = [book.h3.a['title']+" "+price+" "+stock for book in soup51_find]
rows5
#soup51_find

#colnames = ["movie name", "initial release", "director name and stars"]

#data = rows

#df_languages = pd.DataFrame(data, columns=colnames)
#df_languages.head(10)

['A Light in the Attic £51.77 In stock',
 'Tipping the Velvet £51.77 In stock',
 'Soumission £51.77 In stock',
 'Sharp Objects £51.77 In stock',
 'Sapiens: A Brief History of Humankind £51.77 In stock',
 'The Requiem Red £51.77 In stock',
 'The Dirty Little Secrets of Getting Your Dream Job £51.77 In stock',
 'The Coming Woman: A Novel Based on the Life of the Infamous Feminist, Victoria Woodhull £51.77 In stock',
 'The Boys in the Boat: Nine Americans and Their Epic Quest for Gold at the 1936 Berlin Olympics £51.77 In stock',
 'The Black Maria £51.77 In stock',
 'Starving Hearts (Triangular Trade Trilogy, #1) £51.77 In stock',
 "Shakespeare's Sonnets £51.77 In stock",
 'Set Me Free £51.77 In stock',
 "Scott Pilgrim's Precious Little Life (Scott Pilgrim #1) £51.77 In stock",
 'Rip it Up and Start Again £51.77 In stock',
 'Our Band Could Be Your Life: Scenes from the American Indie Underground, 1981-1991 £51.77 In stock',
 'Olio £51.77 In stock',
 'Mesaerion: The Best Science Fiction St

In [None]:
# your code here