## WebScraping

Jednym ze sposobów zdobywania danych jest zbieranie ich samodzielnie. Korzystając z Pythona możemy zautomatyzować ten proces. Zagadnienie to nazywa się web scrapingiem.

Żeby lepiej zrozumieć na jakiej zasadzie działa web scraping musimy najpierw dowiedzieć się jakie są podstawowe zasady działania stron internetowych. 

Za każdym razem kiedy otwieramy stronę internetową nasza przeglądarka robi request do serwera który zwraca pewną odpowiedź. Jeżeli wszystko poszło dobrze, to zawartością tej odpowiedzi jest HTML interesującej nas strony. Może on zawierać odwołania do innych rzeczy na stronie (obrazki, dźwięki, pliki JavaScript) o które przeglądarka zrobi osobne requesty. 

Kiedy korzystamy z przeglądarki internetowej ciężko doprowadzić do nadużyć stron internetowych. Jednak wywołując kod w Pythonie możemy tworzyć nawet tysiące requestów na sekundę co może doprowadzić albo do zablokowania nas, albo nawet do zatkania serwera.

Biblioteka z której będziemy korzystać do robienia requestów nazywa się po prostu `requests`

In [2]:
import requests

### GET request

Requesty typu `GET` służą do odczytywania danych. 

In [3]:
requests.get('https://api.github.com/')

<Response [200]>

### Response

Wynikiem działania requestu jest obiekt typu `Response`.

In [4]:
response = requests.get('https://api.github.com/')

In [5]:
response

<Response [200]>

Jednym z ważnych atrybutów `Response` jest `status_code` który mówi jaki jest wynik zapytania. Odpowiedź `200 OK` oznacza, że zapytanie było udane i dostaliśmy wynik.

Znaczenia innych kodów można sprawadzić tutaj:
https://en.wikipedia.org/wiki/List_of_HTTP_status_codes

Korzystając z odpowiedzi możemy łatwo sprawdzić czy zapytanie się powiodło:

In [6]:
if response:
    print("success")
else:
    print("error!")

success


In [7]:
from requests.exceptions import HTTPError

for url in ['https://api.github.com', 'https://api.github.com/invalid']:
    try:
        response = requests.get(url)

        response.raise_for_status()
    except HTTPError as http_err:
        print(f'HTTP error occurred: {http_err}')
    except Exception as err:
        print(f'Other error occurred: {err}')
    else:
        print('Success!')


Success!
HTTP error occurred: 404 Client Error: Not Found for url: https://api.github.com/invalid


Aby sprawdzić zawartość odpowiedzi musimy skorzystać z atrybutu `content`

In [8]:
response = requests.get('https://api.github.com')

In [9]:
response.content

b'{"current_user_url":"https://api.github.com/user","current_user_authorizations_html_url":"https://github.com/settings/connections/applications{/client_id}","authorizations_url":"https://api.github.com/authorizations","code_search_url":"https://api.github.com/search/code?q={query}{&page,per_page,sort,order}","commit_search_url":"https://api.github.com/search/commits?q={query}{&page,per_page,sort,order}","emails_url":"https://api.github.com/user/emails","emojis_url":"https://api.github.com/emojis","events_url":"https://api.github.com/events","feeds_url":"https://api.github.com/feeds","followers_url":"https://api.github.com/user/followers","following_url":"https://api.github.com/user/following{/target}","gists_url":"https://api.github.com/gists{/gist_id}","hub_url":"https://api.github.com/hub","issue_search_url":"https://api.github.com/search/issues?q={query}{&page,per_page,sort,order}","issues_url":"https://api.github.com/issues","keys_url":"https://api.github.com/user/keys","label_sea

Oprócz `content` możemy również sprawdzić nagłówek odpowiedzi używając `headers`

In [10]:
response.headers

{'Server': 'GitHub.com', 'Date': 'Fri, 08 Oct 2021 14:25:24 GMT', 'Cache-Control': 'public, max-age=60, s-maxage=60', 'Vary': 'Accept, Accept-Encoding, Accept, X-Requested-With', 'ETag': '"4f825cc84e1c733059d46e76e6df9db557ae5254f9625dfe8e1b09499c449438"', 'Access-Control-Expose-Headers': 'ETag, Link, Location, Retry-After, X-GitHub-OTP, X-RateLimit-Limit, X-RateLimit-Remaining, X-RateLimit-Used, X-RateLimit-Resource, X-RateLimit-Reset, X-OAuth-Scopes, X-Accepted-OAuth-Scopes, X-Poll-Interval, X-GitHub-Media-Type, Deprecation, Sunset', 'Access-Control-Allow-Origin': '*', 'Strict-Transport-Security': 'max-age=31536000; includeSubdomains; preload', 'X-Frame-Options': 'deny', 'X-Content-Type-Options': 'nosniff', 'X-XSS-Protection': '0', 'Referrer-Policy': 'origin-when-cross-origin, strict-origin-when-cross-origin', 'Content-Security-Policy': "default-src 'none'", 'Content-Type': 'application/json; charset=utf-8', 'X-GitHub-Media-Type': 'github.v3; format=json', 'Content-Encoding': 'gzip',

W nagłówku znajdziemy takie informacje jak np. `Content-Type` który mówi jakiego typu jest odpowiedź:

In [12]:
response.headers['Content-Type']

'application/json; charset=utf-8'

### Tworzenie zapytań do API

Zapytania `GET` możemy odpowiednio parametryzować z użyciem query string. 

Zapytanie poniżej odpytuje API Githubowe o repozytoria napisane w języku Python.

Dokumentacja Github API: https://docs.github.com/en/rest

In [13]:
response = requests.get(
    'https://api.github.com/search/repositories',
    params={'q': 'requests+language:python'},
)

json_response = response.json()
repository = json_response['items'][0]

repository

{'id': 4290214,
 'node_id': 'MDEwOlJlcG9zaXRvcnk0MjkwMjE0',
 'name': 'grequests',
 'full_name': 'spyoungtech/grequests',
 'private': False,
 'owner': {'login': 'spyoungtech',
  'id': 15212758,
  'node_id': 'MDQ6VXNlcjE1MjEyNzU4',
  'avatar_url': 'https://avatars.githubusercontent.com/u/15212758?v=4',
  'gravatar_id': '',
  'url': 'https://api.github.com/users/spyoungtech',
  'html_url': 'https://github.com/spyoungtech',
  'followers_url': 'https://api.github.com/users/spyoungtech/followers',
  'following_url': 'https://api.github.com/users/spyoungtech/following{/other_user}',
  'gists_url': 'https://api.github.com/users/spyoungtech/gists{/gist_id}',
  'starred_url': 'https://api.github.com/users/spyoungtech/starred{/owner}{/repo}',
  'subscriptions_url': 'https://api.github.com/users/spyoungtech/subscriptions',
  'organizations_url': 'https://api.github.com/users/spyoungtech/orgs',
  'repos_url': 'https://api.github.com/users/spyoungtech/repos',
  'events_url': 'https://api.github.com/

### POST requests

Zapytania typu POST służą do przesyłania danych podczas zapytań. W tym przypadku powinniśmy przekazać dane w `body` a nie jako query string. W `requests` służy do tego parametr `data` lub `json`

In [14]:
response = requests.post('https://httpbin.org/post', data={'key':'value'})

In [15]:
response.content

b'{\n  "args": {}, \n  "data": "", \n  "files": {}, \n  "form": {\n    "key": "value"\n  }, \n  "headers": {\n    "Accept": "*/*", \n    "Accept-Encoding": "gzip, deflate", \n    "Content-Length": "9", \n    "Content-Type": "application/x-www-form-urlencoded", \n    "Host": "httpbin.org", \n    "User-Agent": "python-requests/2.26.0", \n    "X-Amzn-Trace-Id": "Root=1-61605530-765873e9738d02c12bb1c1ea"\n  }, \n  "json": null, \n  "origin": "82.84.248.139", \n  "url": "https://httpbin.org/post"\n}\n'

### Timeout

Kiedy robimy zapytanie do zewnętrznego serwisu musimy poczekać na jego odpowiedź. Może zdarzyć się sytuacja, że serwis nie odpowiada - aby zabezpieczyć się przed tym możemy skorzystać z parametru `timeout`.

In [16]:
requests.get('https://api.github.com', timeout=1)
requests.get('https://api.github.com', timeout=3.05)

<Response [200]>

### Zapisywanie HTML do pliku:

Aby nie nadużywać zasobów strony i nie zostać zablokowanym dobrym pomysłem jest zapisywanie sobie odpowiedzi HTMLowych do pliku i pracowanie na nich offline:

In [17]:
r = requests.get('https://api.github.com', timeout=1)

In [18]:
def save_html(html, path):
    with open(path, 'wb') as f:
        f.write(html)
        
        
save_html(r.content, 'github_com')

In [19]:
def open_html(path):
    with open(path, 'rb') as f:
        return f.read()
    
    
html = open_html('github_com')


### Odczytywanie HTML strony internetowej:

In [20]:
import requests

url = 'https://dataquestio.github.io/web-scraping-pages/simple.html'

r = requests.get(url)

print(r.content[:100])


b'<!DOCTYPE html>\n<html>\n    <head>\n        <title>A simple example page</title>\n    </head>\n    <body'


### BeautifulSoup

Używając `requests` możemy otrzymać HTML danej strony internetowej. Jednak zwykle otrzymane dane będą bardzo skomplikowane i rozbudowane i odczytywanie danych z nich ręcznie może być mocno upierdliwe. Aby ułatwić sobie tą pracę możemy skorzystać z pakietu `beautifulsoup4`, który pozwala nam parsować pliki HTML i XML.

In [21]:
from bs4 import BeautifulSoup
soup = BeautifulSoup(r.content, 'html.parser')

In [22]:
print(soup.prettify())

<!DOCTYPE html>
<html>
 <head>
  <title>
   A simple example page
  </title>
 </head>
 <body>
  <p>
   Here is some simple content for this page.
  </p>
 </body>
</html>


Korzystając z atrybutu `children` możemy przeglądać rekursywnie obiekty HTML:

In [27]:
list(soup.children)

['html',
 '\n',
 <html>
 <head>
 <title>A simple example page</title>
 </head>
 <body>
 <p>Here is some simple content for this page.</p>
 </body>
 </html>]

In [28]:
[type(item) for item in soup.children]

[bs4.element.Doctype, bs4.element.NavigableString, bs4.element.Tag]

W tym przypadku interesuje nas trzeci element listy reprezentujący tag `<html>`

In [29]:
html = list(soup.children)[2]

In [25]:
list(html.children)

['\n',
 <head>
 <title>A simple example page</title>
 </head>,
 '\n',
 <body>
 <p>Here is some simple content for this page.</p>
 </body>,
 '\n']

Analogicznie wyciągamy tag `<body>`:

In [31]:
body = list(html.children)[3]

Skąd w łatwy sposób dostajemy się do tekstu:

In [32]:
list(body.children)

['\n', <p>Here is some simple content for this page.</p>, '\n']

In [33]:
p = list(body.children)[1]

In [34]:
p.get_text()

'Here is some simple content for this page.'

Aby nie musieć manualnie szukać rzeczy na stronie `bs4` udostępnia metody pozwalające na szukanie obiektów po tagu lub klasie css:

In [36]:
soup = BeautifulSoup(r.content, 'html.parser')
soup.find_all('p')  # znajdź wszystkie elementy z tagiem <p>

[<p>Here is some simple content for this page.</p>]

In [37]:
soup.find_all('p')[0].get_text()

'Here is some simple content for this page.'

In [38]:
page = requests.get("https://dataquestio.github.io/web-scraping-pages/ids_and_classes.html")
soup = BeautifulSoup(page.content, 'html.parser')
soup

<html>
<head>
<title>A simple example page</title>
</head>
<body>
<div>
<p class="inner-text first-item" id="first">
                First paragraph.
            </p>
<p class="inner-text">
                Second paragraph.
            </p>
</div>
<p class="outer-text first-item" id="second">
<b>
                First outer paragraph.
            </b>
</p>
<p class="outer-text">
<b>
                Second outer paragraph.
            </b>
</p>
</body>
</html>

Znajdowanie obiektów z tagiem `<p>` i klasą `outer-text`:

In [55]:
soup.find_all('p', class_='outer-text')

[<p class="outer-text first-item" id="second">
 <b>
                 First outer paragraph.
             </b>
 </p>,
 <p class="outer-text">
 <b>
                 Second outer paragraph.
             </b>
 </p>]

In [56]:
soup.find_all(class_="outer-text")

[<p class="outer-text first-item" id="second">
 <b>
                 First outer paragraph.
             </b>
 </p>,
 <p class="outer-text">
 <b>
                 Second outer paragraph.
             </b>
 </p>]

In [57]:
soup.select("div p")

[<p class="inner-text first-item" id="first">
                 First paragraph.
             </p>,
 <p class="inner-text">
                 Second paragraph.
             </p>]

### Scraping strony internetowej z pogodą

Spróbujemy zescrapować pogodę na następne 7 dni w San Francisco.
Skorzystamy ze strony:
https://forecast.weather.gov/MapClick.php?lat=37.7772&lon=-122.4168

Analizując jej źródła zobaczymy, że interesuje nas część otagowana `seven-day-forecast`. Poszczególne prognozy są oznaczone klasą `tombstone-container`.

In [39]:
page = requests.get("https://forecast.weather.gov/MapClick.php?lat=37.7772&lon=-122.4168")
soup = BeautifulSoup(page.content, 'html.parser')
seven_day = soup.find(id="seven-day-forecast")
forecast_items = seven_day.find_all(class_="tombstone-container")
tonight = forecast_items[0]
print(tonight.prettify())

<div class="tombstone-container">
 <p class="period-name">
  Today
  <br/>
  <br/>
 </p>
 <p>
  <img alt="Today: Mostly cloudy, then gradually becoming sunny, with a high near 64. Breezy, with a west wind 15 to 24 mph, with gusts as high as 31 mph. " class="forecast-icon" src="DualImage.php?i=sct&amp;j=wind_few" title="Today: Mostly cloudy, then gradually becoming sunny, with a high near 64. Breezy, with a west wind 15 to 24 mph, with gusts as high as 31 mph. "/>
 </p>
 <p class="short-desc">
  Mostly Sunny
  <br/>
  then Sunny
  <br/>
  and Breezy
 </p>
 <p class="temp temp-high">
  High: 64 °F
 </p>
</div>


In [40]:
period_tags = seven_day.select(".tombstone-container .period-name")
periods = [pt.get_text() for pt in period_tags]
periods

['Today',
 'Tonight',
 'Saturday',
 'SaturdayNight',
 'Sunday',
 'SundayNight',
 'ColumbusDay',
 'MondayNight',
 'Tuesday']

Korzystając z różnych klas możemy wyciągnąć poszczególne informacje:

In [41]:
short_descs = [sd.get_text() for sd in seven_day.select(".tombstone-container .short-desc")]
temps = [t.get_text() for t in seven_day.select(".tombstone-container .temp")]
descs = [d["title"] for d in seven_day.select(".tombstone-container img")]
print(short_descs)
print(temps)
print(descs)

['Mostly Sunnythen Sunnyand Breezy', 'ChanceShowers', 'Slight ChanceShowers thenSunny', 'Mostly Clear', 'Sunny', 'Mostly Clear', 'Sunny', 'Mostly Clear', 'Sunny']
['High: 64 °F', 'Low: 59 °F', 'High: 69 °F', 'Low: 54 °F', 'High: 69 °F', 'Low: 53 °F', 'High: 67 °F', 'Low: 54 °F', 'High: 70 °F']
['Today: Mostly cloudy, then gradually becoming sunny, with a high near 64. Breezy, with a west wind 15 to 24 mph, with gusts as high as 31 mph. ', 'Tonight: A 40 percent chance of showers, mainly after 11pm.  Mostly cloudy, with a steady temperature around 59. West wind 15 to 20 mph decreasing to 5 to 10 mph after midnight. Winds could gust as high as 25 mph. ', 'Saturday: A 20 percent chance of showers before 11am.  Mostly cloudy through mid morning, then gradual clearing, with a high near 69. East southeast wind 5 to 8 mph becoming calm. ', 'Saturday Night: Mostly clear, with a low around 54. West southwest wind 13 to 18 mph decreasing to 5 to 10 mph after midnight. Winds could gust as high as

Tak zdobyte wyniki możemy wrzucić do pandas i stworzyć z nich DataFrame:

In [42]:
import pandas as pd
weather = pd.DataFrame({
    "period": periods,
    "short_desc": short_descs,
    "temp": temps,
    "desc":descs
})
weather

Unnamed: 0,period,short_desc,temp,desc
0,Today,Mostly Sunnythen Sunnyand Breezy,High: 64 °F,"Today: Mostly cloudy, then gradually becoming ..."
1,Tonight,ChanceShowers,Low: 59 °F,"Tonight: A 40 percent chance of showers, mainl..."
2,Saturday,Slight ChanceShowers thenSunny,High: 69 °F,Saturday: A 20 percent chance of showers befor...
3,SaturdayNight,Mostly Clear,Low: 54 °F,"Saturday Night: Mostly clear, with a low aroun..."
4,Sunday,Sunny,High: 69 °F,"Sunday: Sunny, with a high near 69. West wind ..."
5,SundayNight,Mostly Clear,Low: 53 °F,"Sunday Night: Mostly clear, with a low around 53."
6,ColumbusDay,Sunny,High: 67 °F,"Columbus Day: Sunny, with a high near 67."
7,MondayNight,Mostly Clear,Low: 54 °F,"Monday Night: Mostly clear, with a low around 54."
8,Tuesday,Sunny,High: 70 °F,"Tuesday: Sunny, with a high near 70."


Spróbujemy dostać się do temperatury - skorzystamy tutaj z wyrażeń regularnych:

Więcej o wyrażeniach regularnych: https://regexone.com/

In [43]:
temp_nums = weather["temp"].str.extract("(\d+)", expand=False)
weather["temp_num"] = temp_nums.astype('int')
temp_nums

0    64
1    59
2    69
3    54
4    69
5    53
6    67
7    54
8    70
Name: temp, dtype: object

Aby znaleźć prognozy tylko dla nocy możemy skorzystać z kolumny `temp` i wyciągnąc wiersze w których znajduje się słowo "Low".

In [45]:
is_night = weather["temp"].str.contains("Low")
weather["is_night"] = is_night
is_night

0    False
1     True
2    False
3     True
4    False
5     True
6    False
7     True
8    False
Name: temp, dtype: bool

In [46]:
weather[is_night]

Unnamed: 0,period,short_desc,temp,desc,temp_num,is_night
1,Tonight,ChanceShowers,Low: 59 °F,"Tonight: A 40 percent chance of showers, mainl...",59,True
3,SaturdayNight,Mostly Clear,Low: 54 °F,"Saturday Night: Mostly clear, with a low aroun...",54,True
5,SundayNight,Mostly Clear,Low: 53 °F,"Sunday Night: Mostly clear, with a low around 53.",53,True
7,MondayNight,Mostly Clear,Low: 54 °F,"Monday Night: Mostly clear, with a low around 54.",54,True


In [47]:
### Nastepna strona

### Ćwiczenie

Inny przykład web scrapingu to zeskrapowanie strony prowadzącej ranking obiektowności mediów w Stanach Zjednoczonych:

Spróbuj zescrapować tabelkę z AllSides Media Bias Ratings. Możesz posiłkować się poniższym kodem:

In [49]:
import requests

url = 'https://www.allsides.com/media-bias/media-bias-ratings'

r = requests.get(url)

print(r.content[:100])


b'<!DOCTYPE html>\n<html  lang="en" dir="ltr" prefix="og: http://ogp.me/ns# content: http://purl.org/rs'


In [73]:
from bs4 import BeautifulSoup

soup = BeautifulSoup(r.content, 'html.parser')


In [74]:
rows = soup.select('tbody tr')

In [75]:
row = rows[0]

name = row.select_one('.source-title').text.strip()

print(name)


ABC News (Online)


In [76]:
allsides_page = row.select_one('.source-title a')['href']
allsides_page = 'https://www.allsides.com' + allsides_page

print(allsides_page)


https://www.allsides.com/news-source/abc-news-media-bias


In [77]:
bias = row.select_one('.views-field-field-bias-image a')['href']
bias = bias.split('/')[-1]

print(bias)


left-center


In [78]:
agree = row.select_one('.agree').text
agree = int(agree)

disagree = row.select_one('.disagree').text
disagree = int(disagree)

agree_ratio = agree / disagree

print(f"Agree: {agree}, Disagree: {disagree}, Ratio {agree_ratio:.2f}")


Agree: 35455, Disagree: 17959, Ratio 1.97


In [80]:
def get_agreeance_text(ratio):
    if ratio > 3: return "absolutely agrees"
    elif 2 < ratio <= 3: return "strongly agrees"
    elif 1.5 < ratio <= 2: return "agrees"
    elif 1 < ratio <= 1.5: return "somewhat agrees"
    elif ratio == 1: return "neutral"
    elif 0.67 < ratio < 1: return "somewhat disagrees"
    elif 0.5 < ratio <= 0.67: return "disagrees"
    elif 0.33 < ratio <= 0.5: return "strongly disagrees"
    elif ratio <= 0.33: return "absolutely disagrees"
    else: return None
    
print(get_agreeance_text(2.5))


strongly agrees


In [81]:
data= []

for row in rows:
    d = dict()
    
    d['name'] = row.select_one('.source-title').text.strip()
    d['allsides_page'] = 'https://www.allsides.com' + row.select_one('.source-title a')['href']
    d['bias'] = row.select_one('.views-field-field-bias-image a')['href'].split('/')[-1]
    d['agree'] = int(row.select_one('.agree').text)
    d['disagree'] = int(row.select_one('.disagree').text)
    d['agree_ratio'] = d['agree'] / d['disagree']
    d['agreeance_text'] = get_agreeance_text(d['agree_ratio'])
    
    data.append(d)


In [84]:
pages = [
    'https://www.allsides.com/media-bias/media-bias-ratings',
#    'https://www.allsides.com/media-bias/media-bias-ratings?page=1',
#    'https://www.allsides.com/media-bias/media-bias-ratings?page=2'
]


In [85]:
from time import sleep

data= []

for page in pages:
    r = requests.get(page)
    soup = BeautifulSoup(r.content, 'html.parser')
    
    rows = soup.select('tbody tr')

    for row in rows:
        d = dict()

        d['name'] = row.select_one('.source-title').text.strip()
        d['allsides_page'] = 'https://www.allsides.com' + row.select_one('.source-title a')['href']
        d['bias'] = row.select_one('.views-field-field-bias-image a')['href'].split('/')[-1]
        d['agree'] = int(row.select_one('.agree').text)
        d['disagree'] = int(row.select_one('.disagree').text)
        d['agree_ratio'] = d['agree'] / d['disagree']
        d['agreeance_text'] = get_agreeance_text(d['agree_ratio'])

        data.append(d)
    
    sleep(10)


In [86]:
data

[{'name': 'ABC News (Online)',
  'allsides_page': 'https://www.allsides.com/news-source/abc-news-media-bias',
  'bias': 'left-center',
  'agree': 35455,
  'disagree': 17959,
  'agree_ratio': 1.9742190545130576,
  'agreeance_text': 'agrees'},
 {'name': 'AlterNet',
  'allsides_page': 'https://www.allsides.com/news-source/alternet-media-bias',
  'bias': 'left',
  'agree': 13706,
  'disagree': 2968,
  'agree_ratio': 4.617924528301887,
  'agreeance_text': 'absolutely agrees'},
 {'name': 'Associated Press',
  'allsides_page': 'https://www.allsides.com/news-source/associated-press-media-bias',
  'bias': 'center',
  'agree': 26761,
  'disagree': 20602,
  'agree_ratio': 1.2989515581011553,
  'agreeance_text': 'somewhat agrees'},
 {'name': 'Axios',
  'allsides_page': 'https://www.allsides.com/news-source/axios',
  'bias': 'center',
  'agree': 6092,
  'disagree': 6462,
  'agree_ratio': 0.9427421850820179,
  'agreeance_text': 'somewhat disagrees'},
 {'name': 'BBC News',
  'allsides_page': 'https:/

In [88]:
df = pd.DataFrame(data)

In [89]:
df['total_votes'] = df['agree'] + df['disagree']
df.sort_values('total_votes', ascending=False, inplace=True)

df.head(10)


Unnamed: 0,name,allsides_page,bias,agree,disagree,agree_ratio,agreeance_text,total_votes
44,TheBlaze.com,https://www.allsides.com/news-source/theblaze-...,right,99685,80239,1.242351,somewhat agrees,179924
10,CNN (Online News),https://www.allsides.com/news-source/cnn-media...,left,50943,47804,1.065664,somewhat agrees,98747
16,Fox News (Online News),https://www.allsides.com/news-source/fox-news-...,right,41844,47667,0.87784,somewhat disagrees,89511
24,New York Times (News),https://www.allsides.com/news-source/new-york-...,left-center,28942,38784,0.746236,somewhat disagrees,67726
27,NPR (Online News),https://www.allsides.com/news-source/npr-media...,center,31634,30014,1.053975,somewhat agrees,61648
18,HuffPost,https://www.allsides.com/news-source/huffpost-...,left,35462,22240,1.594514,agrees,57702
4,BBC News,https://www.allsides.com/news-source/bbc-news-...,center,29300,25126,1.166123,somewhat agrees,54426
29,Politico,https://www.allsides.com/news-source/politico-...,left-center,23363,30184,0.774019,somewhat disagrees,53547
0,ABC News (Online),https://www.allsides.com/news-source/abc-news-...,left-center,35455,17959,1.974219,agrees,53414
6,Breitbart News,https://www.allsides.com/news-source/breitbart,right,39306,11241,3.496664,absolutely agrees,50547
