# Języki skryptowe w analizie danych - web scraping z Beautiful Soup
###### dr inż. Marcin Lawnik

# Web scraping (wg Wikipedia)

```
Web scraping, web harvesting, or web data extraction is data scraping used for extracting data from websites.
```

### Web scraping a prawo

[Ustawa o ochronie baz danych](http://isap.sejm.gov.pl/isap.nsf/DocDetails.xsp?id=WDU20011281402)

[https://miroslawmamczur.pl/web-scraping-co-to-i-jakie-sa-dobre-praktyki/](https://miroslawmamczur.pl/web-scraping-co-to-i-jakie-sa-dobre-praktyki/)

### Moduł `webbrowser`

In [1]:
import webbrowser
webbrowser.open("https://www.polsl.pl/rms/")

True

### Moduł `requests`

#### `get()`

In [2]:
import requests
strona = requests.get("https://www.polsl.pl/rms/")

#### `status_code` i `requests.codes.ok`

In [3]:
strona.status_code

200

In [4]:
strona.status_code == requests.codes.ok

True

#### `text` i `content`

`text` zawiera zawartość strony w `Unicode`

In [5]:
print(strona.text)

<!DOCTYPE html>
<html lang="pl-PL">
<head>
	<meta charset="utf-8"/>
	<meta name="viewport" content="width=device-width, initial-scale=1">
	<title>Politechnika Śląska | Wydział Matematyki Stosowanej</title>
        <link rel="stylesheet" href="/wp-content/themes/politechnika-v2/assets/css/bootstrap.min.css" type="text/css" media="screen" />
        <link rel="stylesheet" href="/wp-content/themes/politechnika-v2/assets/css/flickity.css" type="text/css" media="screen" />
	<link rel="stylesheet" href="/wp-content/themes/politechnika-v2/assets/css/style.css" type="text/css" media="screen" />
        <link rel="stylesheet" href="/wp-content/themes/politechnika-v2/assets/css/responsive.css" type="text/css" media="screen" />
        <link rel="stylesheet" href="/wp-content/themes/politechnika-wydzial/assets/css/responsive.css" type="text/css" media="screen" />        
        <link rel="stylesheet" href="/wp-content/themes/politechnika-wydzial/assets/css/custom.css" type="text/css" 

In [6]:
print(strona.content)



#### Zapisywanie strony

In [7]:
plik = open('strona.html', 'wb')

for fragment in strona.iter_content(len(strona.text)):
    plik.write(fragment)

plik.close()

### Moduł `Beautiful Soup`

[https://www.crummy.com/software/BeautifulSoup/bs4/doc/](https://www.crummy.com/software/BeautifulSoup/bs4/doc/)

#### Instalacja

```
conda install -c anaconda beautifulsoup4 
```

In [8]:
from bs4 import BeautifulSoup

soup = BeautifulSoup(strona.content, 'html.parser')

print(soup)

<!DOCTYPE html>

<html lang="pl-PL">
<head>
<meta charset="utf-8"/>
<meta content="width=device-width, initial-scale=1" name="viewport"/>
<title>Politechnika Śląska | Wydział Matematyki Stosowanej</title>
<link href="/wp-content/themes/politechnika-v2/assets/css/bootstrap.min.css" media="screen" rel="stylesheet" type="text/css"/>
<link href="/wp-content/themes/politechnika-v2/assets/css/flickity.css" media="screen" rel="stylesheet" type="text/css"/>
<link href="/wp-content/themes/politechnika-v2/assets/css/style.css" media="screen" rel="stylesheet" type="text/css"/>
<link href="/wp-content/themes/politechnika-v2/assets/css/responsive.css" media="screen" rel="stylesheet" type="text/css"/>
<link href="/wp-content/themes/politechnika-wydzial/assets/css/responsive.css" media="screen" rel="stylesheet" type="text/css"/>
<link href="/wp-content/themes/politechnika-wydzial/assets/css/custom.css" media="screen" rel="stylesheet" type="text/css"/>
<link href="/wp-content/themes/politechnika-v2/as

#### `prettify()`

In [9]:
print(soup.prettify())

<!DOCTYPE html>
<html lang="pl-PL">
 <head>
  <meta charset="utf-8"/>
  <meta content="width=device-width, initial-scale=1" name="viewport"/>
  <title>
   Politechnika Śląska | Wydział Matematyki Stosowanej
  </title>
  <link href="/wp-content/themes/politechnika-v2/assets/css/bootstrap.min.css" media="screen" rel="stylesheet" type="text/css"/>
  <link href="/wp-content/themes/politechnika-v2/assets/css/flickity.css" media="screen" rel="stylesheet" type="text/css"/>
  <link href="/wp-content/themes/politechnika-v2/assets/css/style.css" media="screen" rel="stylesheet" type="text/css"/>
  <link href="/wp-content/themes/politechnika-v2/assets/css/responsive.css" media="screen" rel="stylesheet" type="text/css"/>
  <link href="/wp-content/themes/politechnika-wydzial/assets/css/responsive.css" media="screen" rel="stylesheet" type="text/css"/>
  <link href="/wp-content/themes/politechnika-wydzial/assets/css/custom.css" media="screen" rel="stylesheet" type="text/css"/>
  <link href="/wp-conten

#### Dostęp do znaczników

In [10]:
soup.title

<title>Politechnika Śląska | Wydział Matematyki Stosowanej</title>

In [11]:
soup.title.string

'Politechnika Śląska | Wydział Matematyki Stosowanej'

In [12]:
soup.a

<a href="https://outlook.office.com/owa/polsl.pl/" rel="noopener" target="_blank">Poczta</a>

In [13]:
soup.div.div

<div class="container">
<div class="row">
<ul class="mn-archive-site-mail"><li class="menu-item menu-item-type-custom menu-item-object-custom menu-item-24844" id="menu-item-24844"><a href="https://outlook.office.com/owa/polsl.pl/" rel="noopener" target="_blank">Poczta</a></li>
</ul> <div class="mn-right-side">
<div class="mn-font-resize fonts-size">
<a class="font-small" href="">A</a>
<a class="font-medium" href="">A<span>+</span></a>
<a class="font-large" href="">A<span>++</span></a>
</div>
<a class="mn-contrast" href="#">
<img alt="contrast" src="/wp-content/themes/politechnika-v2/assets/images/contrast.svg"/>
</a>
<a class="mn-sound" href="#">
<img alt="mute" src="/wp-content/themes/politechnika-v2/assets/images/speakers.svg"/>
</a>
<ul class="mn-lang-switcher mn-custom-select">
<select id="lang_choice_1" name="lang_choice_1">
<option "="" selected="selected" value="pl">pl</option><option "="" value="en">en</option> </select>
<script type="text/javascript">
                         

**Lista dzieci**

In [14]:
soup.head.contents

['\n',
 <meta charset="utf-8"/>,
 '\n',
 <meta content="width=device-width, initial-scale=1" name="viewport"/>,
 '\n',
 <title>Politechnika Śląska | Wydział Matematyki Stosowanej</title>,
 '\n',
 <link href="/wp-content/themes/politechnika-v2/assets/css/bootstrap.min.css" media="screen" rel="stylesheet" type="text/css"/>,
 '\n',
 <link href="/wp-content/themes/politechnika-v2/assets/css/flickity.css" media="screen" rel="stylesheet" type="text/css"/>,
 '\n',
 <link href="/wp-content/themes/politechnika-v2/assets/css/style.css" media="screen" rel="stylesheet" type="text/css"/>,
 '\n',
 <link href="/wp-content/themes/politechnika-v2/assets/css/responsive.css" media="screen" rel="stylesheet" type="text/css"/>,
 '\n',
 <link href="/wp-content/themes/politechnika-wydzial/assets/css/responsive.css" media="screen" rel="stylesheet" type="text/css"/>,
 '\n',
 <link href="/wp-content/themes/politechnika-wydzial/assets/css/custom.css" media="screen" rel="stylesheet" type="text/css"/>,
 '\n',
 <lin

**Lista potomków**

In [15]:
for dziecko in soup.title.descendants:
    print(dziecko)

Politechnika Śląska | Wydział Matematyki Stosowanej


**Rodzic**

In [16]:
soup.title.parent

<head>
<meta charset="utf-8"/>
<meta content="width=device-width, initial-scale=1" name="viewport"/>
<title>Politechnika Śląska | Wydział Matematyki Stosowanej</title>
<link href="/wp-content/themes/politechnika-v2/assets/css/bootstrap.min.css" media="screen" rel="stylesheet" type="text/css"/>
<link href="/wp-content/themes/politechnika-v2/assets/css/flickity.css" media="screen" rel="stylesheet" type="text/css"/>
<link href="/wp-content/themes/politechnika-v2/assets/css/style.css" media="screen" rel="stylesheet" type="text/css"/>
<link href="/wp-content/themes/politechnika-v2/assets/css/responsive.css" media="screen" rel="stylesheet" type="text/css"/>
<link href="/wp-content/themes/politechnika-wydzial/assets/css/responsive.css" media="screen" rel="stylesheet" type="text/css"/>
<link href="/wp-content/themes/politechnika-wydzial/assets/css/custom.css" media="screen" rel="stylesheet" type="text/css"/>
<link href="/wp-content/themes/politechnika-v2/assets/css/ie.css" media="screen" rel="

**Rodzeństwo**

In [17]:
soup.title.next_sibling

'\n'

In [18]:
for rodzienstwo in soup.title.next_siblings:
    print(rodzienstwo)



<link href="/wp-content/themes/politechnika-v2/assets/css/bootstrap.min.css" media="screen" rel="stylesheet" type="text/css"/>


<link href="/wp-content/themes/politechnika-v2/assets/css/flickity.css" media="screen" rel="stylesheet" type="text/css"/>


<link href="/wp-content/themes/politechnika-v2/assets/css/style.css" media="screen" rel="stylesheet" type="text/css"/>


<link href="/wp-content/themes/politechnika-v2/assets/css/responsive.css" media="screen" rel="stylesheet" type="text/css"/>


<link href="/wp-content/themes/politechnika-wydzial/assets/css/responsive.css" media="screen" rel="stylesheet" type="text/css"/>


<link href="/wp-content/themes/politechnika-wydzial/assets/css/custom.css" media="screen" rel="stylesheet" type="text/css"/>


<link href="/wp-content/themes/politechnika-v2/assets/css/ie.css" media="screen" rel="stylesheet" type="text/css"/>


<link href="https://code.jquery.com/ui/1.12.1/themes/base/jquery-ui.css" rel="stylesheet"/>


<link href="https://fonts.go

In [19]:
soup.title.previous_sibling

'\n'

In [20]:
for rodzienstwo in soup.title.previous_siblings:
    print(rodzienstwo)



<meta content="width=device-width, initial-scale=1" name="viewport"/>


<meta charset="utf-8"/>




#### `find_all()`

In [21]:
soup.find_all('a')

[<a href="https://outlook.office.com/owa/polsl.pl/" rel="noopener" target="_blank">Poczta</a>,
 <a class="font-small" href="">A</a>,
 <a class="font-medium" href="">A<span>+</span></a>,
 <a class="font-large" href="">A<span>++</span></a>,
 <a class="mn-contrast" href="#">
 <img alt="contrast" src="/wp-content/themes/politechnika-v2/assets/images/contrast.svg"/>
 </a>,
 <a class="mn-sound" href="#">
 <img alt="mute" src="/wp-content/themes/politechnika-v2/assets/images/speakers.svg"/>
 </a>,
 <a href="/uczelnia/kontakt/">Kontakt</a>,
 <a href="https://www.polsl.pl/e-politechnika/">
 <img alt="UE" class="mn-ue-logo mn-no-mobile" src="https://www.polsl.pl/wp-content/uploads/2021/04/ue-pl.png"/>
 </a>,
 <a href="https://www.polsl.pl">
 <img alt="Godło Polski" src="https://www.polsl.pl/wp-content/uploads/2021/03/godlo_polski.svg"/>
 </a>,
 <a href="https://www.polsl.pl">
 <img alt="Politechnika Śląska" src="https://www.polsl.pl/wp-content/uploads/2021/03/logo-ps-white.svg"/>
 </a>,
 <a href

In [22]:
soup.find_all('a', limit=3)

[<a href="https://outlook.office.com/owa/polsl.pl/" rel="noopener" target="_blank">Poczta</a>,
 <a class="font-small" href="">A</a>,
 <a class="font-medium" href="">A<span>+</span></a>]

In [23]:
soup.find_all(['a', 'img'])

[<a href="https://outlook.office.com/owa/polsl.pl/" rel="noopener" target="_blank">Poczta</a>,
 <a class="font-small" href="">A</a>,
 <a class="font-medium" href="">A<span>+</span></a>,
 <a class="font-large" href="">A<span>++</span></a>,
 <a class="mn-contrast" href="#">
 <img alt="contrast" src="/wp-content/themes/politechnika-v2/assets/images/contrast.svg"/>
 </a>,
 <img alt="contrast" src="/wp-content/themes/politechnika-v2/assets/images/contrast.svg"/>,
 <a class="mn-sound" href="#">
 <img alt="mute" src="/wp-content/themes/politechnika-v2/assets/images/speakers.svg"/>
 </a>,
 <img alt="mute" src="/wp-content/themes/politechnika-v2/assets/images/speakers.svg"/>,
 <a href="/uczelnia/kontakt/">Kontakt</a>,
 <a href="https://www.polsl.pl/e-politechnika/">
 <img alt="UE" class="mn-ue-logo mn-no-mobile" src="https://www.polsl.pl/wp-content/uploads/2021/04/ue-pl.png"/>
 </a>,
 <img alt="UE" class="mn-ue-logo mn-no-mobile" src="https://www.polsl.pl/wp-content/uploads/2021/04/ue-pl.png"/>

In [24]:
for tag in soup.find_all(True):
    print(tag.name)

html
head
meta
meta
title
link
link
link
link
link
link
link
link
link
link
meta
link
link
script
meta
link
meta
meta
meta
meta
meta
meta
meta
meta
meta
meta
meta
meta
script
link
link
script
style
link
style
link
link
link
link
link
link
link
link
style
link
link
link
link
link
link
link
link
link
link
link
script
script
script
script
script
script
script
script
script
script
script
link
link
meta
link
link
link
script
script
style
link
style
link
link
link
meta
script
body
header
div
div
div
ul
li
a
div
div
a
a
span
a
span
a
img
a
img
ul
select
option
option
script
ul
li
a
a
img
div
div
div
div
div
a
img
div
a
img
div
ul
li
a
li
a
li
a
li
a
li
a
ul
li
a
li
a
ul
li
a
li
a
li
a
li
a
li
a
li
a
li
a
li
a
li
a
li
a
li
a
li
a
li
a
li
a
ul
select
option
option
script
div
img
form
button
input
div
div
a
a
span
a
span
a
img
button
span
span
span
div
div
ul
li
a
ul
li
a
li
a
li
a
li
a
li
a
li
a
li
a
li
a
li
a
li
a
li
a
li
a
li
a
li
a
li
a
li
a
li
a
li
a
ul
li
a
li
a
li
a
li
a
li
a
li
a
li
a
li

**Uwaga** Za pomocą `find_metoda` możemy przeszukiwać dzieci, rodziców, potomków itp.

#### `get(atrybut)`

In [25]:
soup.find_all('a')[0].get("href")

'https://outlook.office.com/owa/polsl.pl/'

In [26]:
for link in soup.find_all('a'):
    print(link.get('href'))

https://outlook.office.com/owa/polsl.pl/



#
#
/uczelnia/kontakt/
https://www.polsl.pl/e-politechnika/
https://www.polsl.pl
https://www.polsl.pl
https://rekrutacja.polsl.pl/
https://www.polsl.pl/rd1-cos/
https://www.polsl.pl/rjo15-sd/
https://www.polsl.pl/pracownik/
https://absolwenci.polsl.pl/
https://www.polsl.pl/uczelnia/
#
https://www.polsl.pl/rar/
https://www.polsl.pl/rau/
https://www.polsl.pl/rb/
https://www.polsl.pl/rch/
https://www.polsl.pl/re/
https://www.polsl.pl/rg/
https://www.polsl.pl/rib/
https://www.polsl.pl/rie/
https://www.polsl.pl/rm/
https://www.polsl.pl/rms/
https://www.polsl.pl/rmt/
https://www.polsl.pl/roz/
https://www.polsl.pl/rt/
https://www.polsl.pl/rif/



https://www.polsl.pl/idub/
https://www.polsl.pl/rms/wydzial/
https://www.polsl.pl/rms/wydzial/o-wydziale/
https://www.polsl.pl/rms/wydzial/historia-wydzialu/
https://www.polsl.pl/rms/wydzial/poczet-dziekanow/
https://www.polsl.pl/rms/wydzial/wladze-wydzialu/
https://www.polsl.pl/rms/wydzial/rada-dziekanska/

#### `has_attr()`

In [27]:
klasy = []
for link in soup.find_all('a'):
    if link.has_attr('class') and link.get('class') not in klasy:
        klasy.append(link.get('class'))

print(klasy)

[['font-small'], ['font-medium'], ['font-large'], ['mn-contrast'], ['mn-sound'], ['mn-rss'], ['mn-read-more'], ['link'], [], ['mn-upominki-lnk']]


In [28]:
linki_glowne = soup.find_all("a", class_="link")
print(linki_glowne)

[<a class="link" href="https://www.polsl.pl/rms/kandydat/"></a>, <a class="link" href="https://www.polsl.pl/rms/student/"></a>, <a class="link" href="https://www.polsl.pl/rms/student/biuro-obslugi-studenta/"></a>, <a class="link" href="https://www.polsl.pl/rms/student/samorzad-studencki/"></a>, <a class="link" href="https://www.polsl.pl/rms/kontakt/"></a>]


#### `select()`

In [29]:
soup.select('a')

[<a href="https://outlook.office.com/owa/polsl.pl/" rel="noopener" target="_blank">Poczta</a>,
 <a class="font-small" href="">A</a>,
 <a class="font-medium" href="">A<span>+</span></a>,
 <a class="font-large" href="">A<span>++</span></a>,
 <a class="mn-contrast" href="#">
 <img alt="contrast" src="/wp-content/themes/politechnika-v2/assets/images/contrast.svg"/>
 </a>,
 <a class="mn-sound" href="#">
 <img alt="mute" src="/wp-content/themes/politechnika-v2/assets/images/speakers.svg"/>
 </a>,
 <a href="/uczelnia/kontakt/">Kontakt</a>,
 <a href="https://www.polsl.pl/e-politechnika/">
 <img alt="UE" class="mn-ue-logo mn-no-mobile" src="https://www.polsl.pl/wp-content/uploads/2021/04/ue-pl.png"/>
 </a>,
 <a href="https://www.polsl.pl">
 <img alt="Godło Polski" src="https://www.polsl.pl/wp-content/uploads/2021/03/godlo_polski.svg"/>
 </a>,
 <a href="https://www.polsl.pl">
 <img alt="Politechnika Śląska" src="https://www.polsl.pl/wp-content/uploads/2021/03/logo-ps-white.svg"/>
 </a>,
 <a href

In [30]:
soup.select('a[class=link]')

[<a class="link" href="https://www.polsl.pl/rms/kandydat/"></a>,
 <a class="link" href="https://www.polsl.pl/rms/student/"></a>,
 <a class="link" href="https://www.polsl.pl/rms/student/biuro-obslugi-studenta/"></a>,
 <a class="link" href="https://www.polsl.pl/rms/student/samorzad-studencki/"></a>,
 <a class="link" href="https://www.polsl.pl/rms/kontakt/"></a>]

In [31]:
soup.select('a.link')

[<a class="link" href="https://www.polsl.pl/rms/kandydat/"></a>,
 <a class="link" href="https://www.polsl.pl/rms/student/"></a>,
 <a class="link" href="https://www.polsl.pl/rms/student/biuro-obslugi-studenta/"></a>,
 <a class="link" href="https://www.polsl.pl/rms/student/samorzad-studencki/"></a>,
 <a class="link" href="https://www.polsl.pl/rms/kontakt/"></a>]

In [32]:
id_ = []
for link in soup.find_all(True):
    if link.has_attr('id'):
        id_.append(link.get('id'))

print(id_)

['wp-block-library-css', 'global-styles-inline-css', 'menu-image-css', 'dashicons-css', 'buttons-css', 'mediaelement-css', 'wp-mediaelement-css', 'media-views-css', 'imgareaselect-css', 'admin-bar-css', 'admin-bar-inline-css', 'elementor-icons-css', 'elementor-frontend-legacy-css', 'elementor-frontend-css', 'elementor-post-470-css', 'front_css-css', 'front2_css-css', 'elementor-pro-css', 'elementor-global-css', 'elementor-post-698-css', 'authorizer-public-css-css', 'google-fonts-1-css', 'utils-js-extra', 'utils-js', 'jquery-core-js', 'jquery-core-js-after', 'jquery-migrate-js', 'moxiejs-js', 'plupload-js', 'wp-statistics-tracker-js-extra', 'wp-statistics-tracker-js', 'auth_public_scripts-js-extra', 'auth_public_scripts-js', 'menu-item-24844', 'lang_choice_1', 'menu-item-3366', 'navbarSupportedContent1', 'menu-item-3361', 'menu-item-3362', 'menu-item-3363', 'menu-item-3364', 'menu-item-3365', 'menu-item-18821', 'menu-item-3540', 'menu-item-3554', 'menu-item-5157', 'menu-item-5185', 'men

In [33]:
soup.select('#e-sticky-js')

[<script id="e-sticky-js" src="https://www.polsl.pl/rms/wp-content/plugins/elementor-pro/assets/lib/sticky/jquery.sticky.min.js?ver=3.4.2" type="text/javascript"></script>]

In [34]:
soup.select('div span')

[<span>+</span>,
 <span>++</span>,
 <span>+</span>,
 <span>++</span>,
 <span class="icon-bar"></span>,
 <span class="icon-bar"></span>,
 <span class="icon-bar"></span>,
 <span class="mn-slider-prev"></span>,
 <span class="mn-slider-next"></span>,
 <span>0</span>,
 <span>0</span>,
 <span>0</span>,
 <span>Mirosław Witkowski </span>,
 <span class="news-date">14.10.2025</span>,
 <span>Mirosław Witkowski </span>,
 <span class="news-date">09.10.2025</span>,
 <span>Mirosław Witkowski </span>,
 <span class="news-date">30.09.2025</span>,
 <span>Mirosław Witkowski </span>,
 <span class="news-date">26.09.2025</span>,
 <span>Mirosław Witkowski </span>,
 <span class="news-date">10.09.2025</span>,
 <span>Mirosław Witkowski </span>,
 <span class="news-date">09.09.2025</span>,
 <span>Mirosław Witkowski </span>,
 <span class="news-date">21.08.2025</span>,
 <span>Mirosław Witkowski </span>,
 <span class="news-date">08.07.2025</span>,
 <span>Mirosław Witkowski </span>,
 <span class="news-date">08.07.2025

#### Podsumowanie `select()`

Selektor | Opis
:---:|:---:
`select('tag')`| znajduje wszystkie elementy o znaczniku `tag`
`select('.klasa')`| znajduje wszystkie elementy o klasie `klasa`
`select('#id')`| znajduje wszystkie elementy o id `id`
`select('tag.klasa')`| znajduje wszystkie elementy o klasie `klasa`
`select('tag tag_1')`| znajduje wszystkie elementy o znaczniku `tag_1`, które mają rodzica `tag`


### Przykład 

Pobieramy listę newsów Politechniki Śląskiej

In [35]:
from bs4 import BeautifulSoup

page = requests.get("https://www.polsl.pl/")
soup = BeautifulSoup(page.content,"html.parser")
news_container = soup.select(".mn-news-container")

news_container

[<div class="mn-news-container" data-index="0" data-visible-articles="1" style="margin-bottom: 20px; display: block;">
 <div class="mn-image-container">
 <img alt="photo" src="/wp-content/uploads/api-cache-images/71296/db745d2b-8a17-465c-a216-fedb7a09344b.jpg">
 <div class="blue-apla-hover"></div>
 </img></div>
 <div class="mn-list-view">
 <div class="mn-news-content">
 <p>Zamówienia publiczne - zaproszenie na spotkanie</p>
 <p class="mn-news-intro-text"></p>
 </div>
 <div class="mn-list-view-elements">
 <a class="mn-read-more" href="https://www.polsl.pl/ps_aktualnosci/zamowienia-publiczne-zaproszenie-na-spotkanie/"></a>
 <div class="mn-list-view-author"><label>Autor: </label> <span>Jolanta Skwaradowska </span></div>
 <div class="mn-list-view-pub-date"><label>Publikacja: </label> 2025-10-10 12:45:00 </div>
 <div class="mn-list-view-updt-date"><label>Aktualizacja: </label> 2025-10-13 13:10:41 </div>
 <span class="news-date">10.10.2025</span>
 </div>
 </div>
 </div>,
 <div class="mn-news

In [36]:
newsy = [i.select(".mn-news-content p")[0].get_text() for i in news_container]
newsy

['Zamówienia publiczne - zaproszenie na spotkanie',
 'Awans Politechniki Śląskiej w rankingu THE World University Rankings 2026',
 'Weź udział w XIII edycji Międzynarodowej Konferencji Naukowej EPAE',
 'Sukces SKN Data Science w HackYeah 2025',
 '80 lat Biblioteki Politechniki Śląskiej',
 'Władze Politechniki Śląskiej spotkały się z Samorządem Studenckim',
 'Zaproszenie na DocDay 2025',
 'Politechnika Śląska w raporcie IAEA nt. transformacji od węgla do atomu',
 'Naukowcy Politechniki Śląskiej na Forum Nowego Przemysłu',
 'Nauka płynie z dobrych źródeł – 20 lat Nocy Naukowców',
 'Spektakl charytatywny dla Mateusza',
 'Finał konkursu FameLab Poland 2025 ',
 'Wizyta laureatów konkursu Hackathon Creaton w ORLEN S.A.',
 'ZNP Politechniki Śląskiej świętuje 80-lecie działalności',
 'Politechnika Śląska w czołówce rankingu patentowego Urzędu Patentowego Rzeczypospolitej Polskiej',
 'Politechnika Śląska stała się areną dyskusji o edukacji',
 'Nauka płynie z dobrych źródeł! 20. Noc Naukowców Po

In [37]:
adresy = [i.select(".mn-list-view-elements a") for i in news_container]
adresy

[[<a class="mn-read-more" href="https://www.polsl.pl/ps_aktualnosci/zamowienia-publiczne-zaproszenie-na-spotkanie/"></a>],
 [<a class="mn-read-more" href="https://www.polsl.pl/ps_aktualnosci/awans-politechniki-slaskiej-w-rankingu-the-world-university-rankings-2026"></a>],
 [<a class="mn-read-more" href="https://www.polsl.pl/ps_aktualnosci/wez-udzial-w-xiii-edycji-miedzynarodowej-konferencji-naukowej-epae"></a>],
 [<a class="mn-read-more" href="https://www.polsl.pl/ps_aktualnosci/sukces-skn-data-w-hackyeah-2025"></a>],
 [<a class="mn-read-more" href="https://www.polsl.pl/ps_aktualnosci/80-lat-biblioteki-politechniki-slaskiej/"></a>],
 [<a class="mn-read-more" href="https://www.polsl.pl/ps_aktualnosci/wladze-politechniki-slaskiej-spotkaly-sie-z-samorzadem-studenckim"></a>],
 [<a class="mn-read-more" href="https://www.polsl.pl/ps_aktualnosci/zaproszenie-na-docday-2025"></a>],
 [<a class="mn-read-more" href="https://www.polsl.pl/ps_aktualnosci/politechnika-slaska-w-raporcie-iaea-nt-transfo

In [38]:
adresy_href = [i[0].get("href") for i in adresy[:10]]
adresy_href

['https://www.polsl.pl/ps_aktualnosci/zamowienia-publiczne-zaproszenie-na-spotkanie/',
 'https://www.polsl.pl/ps_aktualnosci/awans-politechniki-slaskiej-w-rankingu-the-world-university-rankings-2026',
 'https://www.polsl.pl/ps_aktualnosci/wez-udzial-w-xiii-edycji-miedzynarodowej-konferencji-naukowej-epae',
 'https://www.polsl.pl/ps_aktualnosci/sukces-skn-data-w-hackyeah-2025',
 'https://www.polsl.pl/ps_aktualnosci/80-lat-biblioteki-politechniki-slaskiej/',
 'https://www.polsl.pl/ps_aktualnosci/wladze-politechniki-slaskiej-spotkaly-sie-z-samorzadem-studenckim',
 'https://www.polsl.pl/ps_aktualnosci/zaproszenie-na-docday-2025',
 'https://www.polsl.pl/ps_aktualnosci/politechnika-slaska-w-raporcie-iaea-nt-transformacji-od-wegla-do-atomu/',
 'https://www.polsl.pl/ps_aktualnosci/naukowcy-politechniki-slaskiej-na-forum-nowego-przemyslu/',
 'https://www.polsl.pl/ps_aktualnosci/nauka-plynie-z-dobrych-zrodel-20-lat-nocy-naukowcow/']

In [39]:
f = open(r'C:\Users\Marcin\OneDrive - Politechnika Śląska\Pulpit\JSwAD stac\wykłady\6\nowy.html', 'r')
s = f.read()
soup_n = BeautifulSoup(s,"html.parser")
print(soup_n.prettify())

<!DOCTYPE html>
<html>
 <head>
  <meta charset="utf-8"/>
  <title>
   Newsy
  </title>
 </head>
 <body>
  Ala ma pythona.
 </body>
</html>



In [40]:
original_tag = soup_n.body
original_tag.clear()

for i in range(10):
    new_tag_p = soup_n.new_tag("p", id="polsl"+str(i))
    
    new_tag_a = soup_n.new_tag('a')
    new_tag_a.attrs['href'] = adresy_href[i]
    new_tag_a.append(newsy[i])
    new_tag_p.append(new_tag_a)
    original_tag.append(new_tag_p)
    
print(str(original_tag))

<body><p id="polsl0"><a href="https://www.polsl.pl/ps_aktualnosci/zamowienia-publiczne-zaproszenie-na-spotkanie/">Zamówienia publiczne - zaproszenie na spotkanie</a></p><p id="polsl1"><a href="https://www.polsl.pl/ps_aktualnosci/awans-politechniki-slaskiej-w-rankingu-the-world-university-rankings-2026">Awans Politechniki Śląskiej w rankingu THE World University Rankings 2026</a></p><p id="polsl2"><a href="https://www.polsl.pl/ps_aktualnosci/wez-udzial-w-xiii-edycji-miedzynarodowej-konferencji-naukowej-epae">Weź udział w XIII edycji Międzynarodowej Konferencji Naukowej EPAE</a></p><p id="polsl3"><a href="https://www.polsl.pl/ps_aktualnosci/sukces-skn-data-w-hackyeah-2025">Sukces SKN Data Science w HackYeah 2025</a></p><p id="polsl4"><a href="https://www.polsl.pl/ps_aktualnosci/80-lat-biblioteki-politechniki-slaskiej/">80 lat Biblioteki Politechniki Śląskiej</a></p><p id="polsl5"><a href="https://www.polsl.pl/ps_aktualnosci/wladze-politechniki-slaskiej-spotkaly-sie-z-samorzadem-studencki

In [41]:
html = soup_n.prettify("utf-8")

with open("output.html", "wb") as file:
    file.write(html)

### Dziękuję za uwagę