# How to run this file

Required: 
- An OpenAI API key

Optional:
- A Langsmith API key


Your secrets (API keys) must be put in a ".env" file in the same folder as this file.
The format should be OPENAI_API_KEY = "sk-..." (replace with your own key)

In [31]:
import os
from dotenv import load_dotenv
load_dotenv()

True

# Approach

The goal of this file is to take artciles as an input, and output a list of actors and relationships to a file, that can be later retrieved to build a graph. Intermediate steps:
- Chunk articles in order to fit in OpenAI GPT-4 8k token window
- Extract entities using OpenAI functions, the instructor library and a Pydantic schema
- Filter and re-format the LLM output
 
 
 
<img src="streetpress_extraction.png" width="800" height="480">

# Pre-processing the articles

In [32]:
# List of articles URLs. Only one for testing purposes
articles = [
    'https://www.streetpress.com/sujet/1700562976-catacombes-fight-club-radicaux-extreme-droite-marc-cacqueray'
]

In [78]:
# Retrieve and chunk the article's text

import requests
from bs4 import BeautifulSoup

# Retrieves the text from the URL and get it chunked into chunks of less than max_char_per_chunk
def get_chunks(article_url, max_char_per_chunk=8000):
    response = requests.get(article_url)
    text = response.text
    soup = BeautifulSoup(text, 'html.parser')
    full_text = soup.find(class_='main-article').find('article')
    chunks = chunk_text(full_text, max_char_per_chunk)
    return chunks

# Chunks a text preserving the paragraphs. Creates chunks the biggest possible chunks within the char_max limit
def chunk_text(full_text, char_max):
    paragraphs = full_text.find_all('p')
    chunked_text = []
    current_chunk = ''
    for paragraph in paragraphs:
        if len(current_chunk) + len(paragraph.get_text()) <= char_max:
            current_chunk += paragraph.get_text() + '/n'
        elif len(current_chunk)>0:
            chunked_text.append(current_chunk.strip())
            current_chunk = ''
        elif len(current_chunk) == 0:
            chunked_text.append(paragraph.get_text().strip())
            current_chunk = ''
        else:
            raise ValueError("Unexpected value in chunk_text()")
        
    if len(current_chunk.strip())>0:
        chunked_text.append(current_chunk.strip())
    return chunked_text


In [34]:
# Test the funtions on the test article
example_chunk = get_chunks(articles[0])[0]
print(example_chunk[0:1000])

On trouve de tout dans les catacombes parisiennes. Objet de nombreux fantasmes, le réseau de galeries souterraines abrite une véritable vie six pieds sous terre. Aujourd’hui, les kilomètres de tunnels et les quelques grandes salles sont fréquentés aussi bien par des amateurs de frissons que des communautés proches des milieux punks ou antifascistes. Et depuis au moins 6 mois, au détour d’un couloir, vous pouvez tomber sur un fight club souterrain, organisé par des militants d’extrême droite. Surnommé le « Kata fight club », celui-ci n’a évidemment rien d’officiel. Il regroupe, selon nos informations, une petite dizaine de personnes qui descendent quasiment toutes les semaines. Ce groupe, comme le reste des cataphiles, se promène, écoute de la musique, discute… et s’entraîne à la boxe en short et pieds nus, parfois au milieu d’ossements humains./n




Dans une zone interdite des catacombes de Paris, des militants d’extrême droite organisent des fight clubs souterrains. / 
          Créd

# Entity extraction

- Entity extraction leverages the [Instructor library](https://github.com/jxnl/instructor)
- It leverage OpenAI functions and Pydantic schema definition
- The choice was to extract all the data in a single pass.

In [35]:
# Pydantic schema

from pydantic import BaseModel, Field
from typing import List, Literal


class Relationship(BaseModel):
    source: int
    target: int
    label: str
    rationale: str


class Actor(BaseModel):
    """
    Actor: person, group of people, club, company, administration, association, institution
    """
    id: str = Field(
        ...,
        description="Unique identifier for the actor, of the form actor_1, actor_2,...",
    )
    name: str
    label: Literal["PERSON", "ORGANIZATION"]

    key_findings: List[str] = Field(
        ...,
        description="List of relevant findings regarding this actor and its relationships to other radical right actors",
    )

    related_actors: List[str] = Field(
        ...,
        description="List all actors this entity is involved with.",
    )

    actor_reputation: str = Field(
        ...,
        description="Reflect on the actor, and answer the following questions: Is this actor known outside the scope of this content? For people who know about it, do they think about this actor as a radical right actor?",
    )
    belongs_to_radical_right: bool = Field(
        ...,
        description="true if the actor is reputed outside the scope of this content to be a radical right entity, false otherwise",
    )
    named_actor: bool = Field(
        ...,
        description="True if the name of actor is a rigid designator. False if it is a flaccid designator.",
    )


class DocumentExtraction(BaseModel):
    
    entities_list: List[str] = Field(
        ...,
        description="List all of entities mentionned in the document.",
    )

    entities_roles: List[str] = Field(
        ...,
        description="For each entity in entities_list, make one sentence explaining its role in the content.",
    )

    is_it_an_actor: List[str] = Field(
        ...,
        description="For each entity in entities_roles, answer the question: is it an actor? An actor is one of the following : person, group of people, club, company, administration, association, institution?",
    )
    
    actors_details: List[Actor] = Field(
        ...,
        description="Each actor should be its separate object",
    )
        
    relationships_list: List[Relationship] = Field(
        ...,
        description="List of relationships between actors.",
    )


In [77]:
# GPT-4 call
# Initial attempts with GPT4-Turbo displayed issues with accentuated characters. 
# The prompt is the result of multiple trial and errors. It seems to work pretty consistently well. 
# It is brittle however: small changes tend to produce regressions. So it's hard to improve further.
# The model was made as deterministic as possible (temperature = 0 and p as small as possible without being 0)


import instructor
from openai import OpenAI
from langsmith.run_helpers import traceable

# Apply the patch to the OpenAI client
# enables response_model keyword
client = instructor.patch(OpenAI())

@traceable(run_type="llm", name="openai.ChatCompletion.create")
def extract_entities(content) -> DocumentExtraction:
    return client.chat.completions.create(
        model="gpt-4",
        response_model=DocumentExtraction,
        temperature = 0,
        top_p=0.0000000000000001,
        max_retries=2,
        messages=[
            {
                "role": "system",
                "content": f'''
                We are mapping the networks of the radical right and their activities. 
                
                To do so we will proceed in steps: 
                1. Think step by step, and extract all entities from the text and write them in entities_list.
                2. For each entity in entities_list, add one sentence to entities_roles explaining its role in the content.
                3. For each entity in entities_roles, answer the question: is it an actor?
                4. If an entity is an actor, add an object to actors_details
                5. Fill in the actor_reputation by answering the following questions: Is this actor known outside the scope of this content? For people who know about it, do they think about this entity as a radical right entity?
                6. Mark belongs_to_radical_right as true if the actor is reputed as a radical right actor outside the scope of this content
                7. Mark named_actor as true if the name of actor is a rigid designator or false if it is a flaccid designator.
                8. Using all the information collected, think step by step to create the list of all relationships between actors in the content.

                ''',
            },
            {
                "role": "user",
                "content": content,
            },
        ],
    ) 

In [37]:
# Print the LLM response to inspect it

def print_llm_response(de: DocumentExtraction):
    print("entities_list:")
    [print(f"  - {entity}") for entity in de.entities_list]
    print("actors_details:")
    for actor in de.actors_details:
        print(f"  - id: {actor.id}")
        print(f"  - label: {actor.label}")
        print(f"  - named_actor: {actor.named_actor}")
        print("key_findings:")
        [print(f"    - {key_finding}") for key_finding in actor.key_findings]
        print(f"  - related_actors: {actor.related_actors}")
        print(f"  - actor_reputation: {actor.actor_reputation}")
        print(f"  - belongs_to_radical_right: {actor.belongs_to_radical_right}")
    print("entities_roles:")
    [print(f"  - {entity_role}") for entity_role in de.entities_roles]
    print("is_it_an_actor:")
    [print(f"  - {is_it_an_actor}") for is_it_an_actor in de.is_it_an_actor]
    print("relationships_list:")
    for rel in de.relationships_list:
        print(f"  - label: {rel.label}")
        print(f"    source: {rel.source}")
        print(f"    target: {rel.target}")
        print(f"    rationale: {rel.rationale}")

In [38]:
# Test LLM extraction on the the test chunk
raw_llm_response  = extract_entities(example_chunk)

In [39]:
# Print the result
print_llm_response(raw_llm_response)

# Post-process the LLM response

In [40]:
# Reframe an Actor object as a dict, keeping only the properties of interest.
# Add an id based on the name

import unicodedata
import re

def remove_accents(input_string):
    # Normalize the string to decompose accented characters
    nfkd_form = unicodedata.normalize('NFKD', input_string)
    # Return the string with accents removed
    return "".join([c for c in nfkd_form if not unicodedata.combining(c)])

def to_id(input_string):
    no_accents_string = remove_accents(input_string)
    alphanum_string = re.sub(r'[^a-zA-Z0-9 ]', '', no_accents_string)
    return alphanum_string.lower().replace(' ', '_')

def format_actor(actor):
    id = to_id(actor.name)
    id_dict={}
    id_dict[actor.id] = id
    key_information = "/n".join(actor.key_findings)
    return (
        {
            'id': id,
            'name': actor.name,
            'label': actor.label,
            'key_information': key_information
        }, 
        id_dict
    )

In [41]:
# Filter actors to keep only the ones that are named actors and that belong to the radical right
# Format the actors
# Returns also an id_map to map numerical ids to name ids

def filter_and_format_actors(actors):
    relevant_actors = [actor for actor in actors if (actor.belongs_to_radical_right & actor.named_actor)]
    formatted_actors = []
    id_map = {}
    for actor in relevant_actors:
        ids_dict={}
        formatted_actor, ids_dict = format_actor(actor)
        formatted_actors.append(formatted_actor)
        id_map = id_map | ids_dict

    return formatted_actors, id_map



In [42]:
# Test the filtering and formatting of actors on our text LLM call response

formatted_actors, id_map = filter_and_format_actors(raw_llm_response.actors_details)
print(formatted_actors)
print(id_map)

[{'id': 'pierre', 'name': 'Pierre', 'label': 'PERSON', 'key_information': 'Member of a radical right group/nInvolved in violent activities'}, {'id': 'tristan_c', 'name': 'Tristan C.', 'label': 'PERSON', 'key_information': 'Leader of a radical right group/nInvolved in violent activities'}, {'id': 'gud', 'name': 'Gud', 'label': 'ORGANIZATION', 'key_information': 'Radical right group/nInvolved in violent activities'}, {'id': 'dm', 'name': 'DM', 'label': 'ORGANIZATION', 'key_information': 'Radical right group/nInvolved in violent activities'}, {'id': 'titeuf', 'name': 'Titeuf', 'label': 'ORGANIZATION', 'key_information': 'Radical right group/nInvolved in violent activities'}, {'id': 'division_martel', 'name': 'Division Martel', 'label': 'ORGANIZATION', 'key_information': 'Radical right group/nInvolved in violent activities'}, {'id': 'vandal_besak_de_besancon', 'name': 'Vandal Besak de Besançon', 'label': 'ORGANIZATION', 'key_information': 'Radical right group/nInvolved in violent activitie

In [43]:
# Keep only the relationships that have both a source and target within the refined list of actors

def format_rels(rels, id_map):
    formatted_rels = []
    for rel in rels:
        formatted_rel = {}
        if id_map.get(str(rel.source)) and id_map.get(str(rel.target)):
            formatted_rel = {
                'source': id_map.get(str(rel.source)),
                'target': id_map.get(str(rel.target)),
                'label': rel.label,
                'rationale': rel.rationale
            }
            formatted_rels.append(formatted_rel)

    return formatted_rels


In [44]:
# Test the filtering and formatting of relationships on our test LLM call response

formatted_rels = format_rels(raw_llm_response.relationships_list, id_map)
print(formatted_rels)

[{'source': 'pierre', 'target': 'tristan_c', 'label': 'Member-Leader', 'rationale': 'Pierre is a member of the group led by Tristan C.'}, {'source': 'pierre', 'target': 'gud', 'label': 'Member-Group', 'rationale': 'Pierre is a member of Gud.'}, {'source': 'pierre', 'target': 'dm', 'label': 'Member-Group', 'rationale': 'Pierre is a member of DM.'}, {'source': 'pierre', 'target': 'titeuf', 'label': 'Member-Group', 'rationale': 'Pierre is a member of Titeuf.'}, {'source': 'pierre', 'target': 'division_martel', 'label': 'Member-Group', 'rationale': 'Pierre is a member of Division Martel.'}, {'source': 'pierre', 'target': 'vandal_besak_de_besancon', 'label': 'Member-Group', 'rationale': 'Pierre is a member of Vandal Besak de Besançon.'}, {'source': 'pierre', 'target': 'korrigan_squad', 'label': 'Member-Group', 'rationale': 'Pierre is a member of Korrigan Squad.'}, {'source': 'pierre', 'target': 'lenny_m', 'label': 'Member-Group', 'rationale': 'Pierre is a member of the group led by Lenny M.

# Save LLM responses to disk

In [45]:
# Initialization : 
# - creates a folder for the files if it does not exist
# - retrieves the highest numerical index from the stored files

import os
import glob
import re

folder_name = "llm_responses"  # Replace with your folder name

# Create the folder if it does not exist
if not os.path.exists(folder_name):
    os.makedirs(folder_name)

def find_highest_doc_index(folder_path):
    # Pattern to match 'article-x_chunk-y' and capture 'x'
    pattern = re.compile(r"article-(\d+)_chunk-\d+")

    highest_index = 0

    # Iterate through all files in the folder
    for filename in glob.glob(os.path.join(folder_path, "*")):
        match = pattern.search(os.path.basename(filename))
        if match:
            # Extract 'x' and update the highest index
            index = int(match.group(1))
            if index > highest_index:
                highest_index = index

    return highest_index

find_highest_doc_index(os.path.join(".", folder_name))

2

In [48]:
# Save to disk

import json
import datetime

def save_entities_to_disk(article_url, doc_index, chunk_index, actors, rels, folder_name="llm_responses"):

    # Create the folder if it does not exist
    folder_path = os.path.join(".", folder_name)
    if not os.path.exists(folder_path):
        os.makedirs(folder_path)

    file_name = "article-" + str(doc_index) + "_chunk-" + str(chunk_index) + ".json"
    file_path = os.path.join(folder_path, file_name)

    # Data to save
    data = {
        "article": article_url,
        "processing_date": datetime.datetime.now().isoformat(),
        "actors": actors,
        "relationships": rels
    }

    # Save data to a file
    with open(file_path, 'w') as file:
        json.dump(data, file)

In [49]:
# Test saving to the disk on our test LLM call response 

doc_index = find_highest_doc_index(os.path.join(".", folder_name))+1
save_entities_to_disk(articles[0], doc_index, "1", formatted_actors, formatted_rels)

# Keep track of which articles have been processed yet

In [50]:
# Retrieve and update the list of processed articles in order to be able to add to the list without re-running on all articles

import os
import json

def get_processed_articles(folder_name = "llm_responses", file_name = "processed_articles.json"):

    folder_path = os.path.join(".", folder_name)
    file_path = os.path.join(folder_path, file_name)

    if os.path.exists(folder_path) and os.path.isfile(file_path):
        # Folder and file exist
        with open(file_path, 'r') as file:
            processed_articles = json.load(file)

    return processed_articles

def update_processed_articles(articles, folder_name="llm_responses", file_name = "processed_articles.json"):

    # Create the folder if it does not exist
    folder_path = os.path.join(".", folder_name)
    if not os.path.exists(folder_path):
        os.makedirs(folder_path)

    file_path = os.path.join(folder_path, file_name)

    with open(file_path, 'w') as file:
        json.dump(articles, file)
    

In [51]:
get_processed_articles()

['https://www.streetpress.com/sujet/1700562976-catacombes-fight-club-radicaux-extreme-droite-marc-cacqueray',
 'https://www.streetpress.com/sujet/1697625538-fafleaks-division-martel-bebes-neonazis-parisiens-gud-marc-cacqueray']

# Batch process articles (putting it all together)

In [85]:
# List of articles URLs
articles = [
    'https://www.streetpress.com/sujet/1700562976-catacombes-fight-club-radicaux-extreme-droite-marc-cacqueray',
    'https://www.streetpress.com/sujet/1697625538-fafleaks-division-martel-bebes-neonazis-parisiens-gud-marc-cacqueray',
    'https://www.streetpress.com/sujet/1702036421-remparts-lyon-groupe-extreme-droite-violent-viseur-justice',
    'https://www.streetpress.com/sujet/1701869410-division-martel-dissoute-gouvernement-darmanin-gros-lardon-extreme-droite-neonazis',
    'https://www.streetpress.com/sujet/1701863544-extreme-droite-radicale-gilets-jaunes-identitaires-mort-thomas',
    'https://www.streetpress.com/sujet/1701443531-neonazis-division-martel-debarquent-anniversaire-streetpress-extreme-droite-menaces',
    'https://www.streetpress.com/sujet/1701278537-rassemblement-national-rn-radicaux-identitaires-monarchistes-neonazis-le-pen-bardella',
    'https://www.streetpress.com/sujet/1701276408-huit-identitaires-interpelles-lyon-collages-rassemblement-thomas-extreme-droite',
    'https://www.streetpress.com/sujet/1701272670-parloir-colombier-chretien-faf-paris-extreme-droite-don-fillon',
    'https://www.streetpress.com/sujet/1701088297-descente-raciste-romans-isere-profils-interpelles-thomas-extreme-droite-neonazis',
    'https://www.streetpress.com/sujet/1699893281-tabassage-intimidations-soutien-lepen-rassemblement-national-ligue-defense-juive-manifestation-antisemitisme-ldj',
    'https://www.streetpress.com/sujet/1699454359-marc-cacqueray-valmenier-noble-nazi-gud-extreme-droite-division-martel-c9m-chatillon-loustau',
    'https://www.streetpress.com/sujet/1698230999-notre-dame-orveau-lycee-fafs-extreme-droite-cacqueray-lepen-villiers-chatillon-gud-cathos-tradis',
    'https://www.streetpress.com/sujet/1698230796-asla-association-defense-identitaires-sos-mediterannee',
    'https://www.streetpress.com/sujet/1697194029-civitas-catholique-integriste-continue-activites-malgre-dissolution-pelerinage',
    'https://www.streetpress.com/sujet/1695658083-eliot-bertin-nouveau-chef-ultra-violent-extreme-droite-lyonnaise',
    'https://www.streetpress.com/sujet/1695652604-bar-huitres-prefere-extreme-droite-parisienne-conversano-zemmour',
    'https://www.streetpress.com/sujet/1695657250-bikers-neonazis-serge-ayoub-installer-nantes-gremium-hells-angels-moto',
    'https://www.streetpress.com/sujet/1695393648-annecy-militant-radicaux-extreme-droite-condamne-manifestation',
    'https://www.streetpress.com/sujet/1694606037-tabassage-etudiants-insultes-artistes-drag-oriflamme-groupuscule-extreme-droite-rennes',
    'https://www.streetpress.com/sujet/1694173867-proces-france-maroc-charges-levees-tribunal-paris-ratonnade',
    'https://www.streetpress.com/sujet/1694096841-militaires-neonazis-armee-suspendus-revelations-streetpress',
    'https://www.streetpress.com/sujet/1693819816-nazi-militaires-neonazis-regiment-belfort-besancon-vandal',
    'https://www.streetpress.com/sujet/1687965898-action-francaise-ecole-cadres-reactionnaires-darmanin-ministres',
    'https://www.streetpress.com/sujet/1687961182-municipales-2026-machiavelique-extreme-droite-civitas',
    'https://www.streetpress.com/sujet/1686916591-ratonnade-paris-tres-jeunes-militants-extreme-droite-garde-vue-police-lycee-racisme-division-martel',
    'https://www.streetpress.com/sujet/1685464152-millionnaires-angevins-sponsors-extreme-droite-violente',
    'https://www.streetpress.com/sujet/1685458445-dominique-delawarde-ancien-general-pro-russe-antisemite-rn-soral-cnews',
    'https://www.streetpress.com/sujet/1685010699-ahamada-siby-cuisinier-sans-papiers-vire-patron-soutien-zemmour-extreme-droite-racisme',
    'https://www.streetpress.com/sujet/1683815533-facs-agressions-neonazis-retour-gud-zouaves-C9M-nationalistes-revolutionnaires-extreme-droite',
    'https://www.streetpress.com/sujet/1683295666-hooligans-neonazis-tabassent-attache-parlementaire-france-insoumise-stade-france-coupe-nantes-toulouse-agression-racisme',
    'https://www.streetpress.com/sujet/1682347242-dissolution-generation-identitaire-jeunesse-extreme-droite-nationaliste-revolutionnaire-radicaux',
    'https://www.streetpress.com/sujet/1682354757-youtubeur-extreme-droite-georges-youtube-journalisme-enquete-black-blocs',
    'https://www.streetpress.com/sujet/1680867145-militants-extreme-droite-projet-attentat-contre-bilal-hassani-harcelement',
    'https://www.streetpress.com/sujet/1680092832-alain-soral-leaks-comptes-bancaires-societes-associations-buisness-fachosphere-complotisme',
    'https://www.streetpress.com/sujet/1680093736-lettre-extreme-droite-faits-documents-agrement-publication-presse-ministere-culture',
    'https://www.streetpress.com/sujet/1680106046-reims-neonazis-mesos-collent-affiches-rn-hooligans-lepen-bardella',
    'https://www.streetpress.com/sujet/1680100724-propagande-russe-francais-pierre-de-gaulle-elie-hatem-rencontre-lavrov-ministre-steven-seagal',
    'https://www.streetpress.com/sujet/1677516324-neonazis-gud-revent-conquerir-paves-paris-fiches-etat-extreme-droite',
    'https://www.streetpress.com/sujet/1675174140-neonazis-fans-armes-extreme-droite-identitaire-barjols-macron-attentat-migrants',
    'https://www.streetpress.com/sujet/1671632946-rappeur-goldofaf-retour-extreme-droite-chanteur-rap-musique-menace-journaliste',
    'https://www.streetpress.com/sujet/1671546805-omerta-pedocriminalite-cathos-integristes-fraternite-saint-pie-x-pedophilie-eglise',
    'https://www.streetpress.com/sujet/1669732008-bienvenue-nouveaux-hooligans-hools-francais-stade-foot-francais-violence-extreme-droite-video-youtube'
]

In [86]:
import datetime

highest_index = find_highest_doc_index(os.path.join(".", folder_name))
processed_articles = get_processed_articles()

filtered_articles = [article for article in articles if article not in processed_articles]
print(f"Number of articles not yet processed: {len(filtered_articles)}")

for article_index, article in enumerate(filtered_articles, highest_index+1):
    print(f"\n\nNEW ARTICLE: starting to work on {article}")
    chunks = get_chunks(article)
    print(f"  chunks lengths: {[len(chunk) for chunk in chunks]}")
    for chunk_index, chunk in enumerate(chunks, 1):
        print(f"  New chunk: chunk {chunk_index}. Started at: {datetime.datetime.now()}")
        try:
            raw_llm_response  = extract_entities(chunk)
            formatted_actors, id_map = filter_and_format_actors(raw_llm_response.actors_details)
            formatted_rels = format_rels(raw_llm_response.relationships_list, id_map)
            save_entities_to_disk(article, article_index, chunk_index, formatted_actors, formatted_rels)
        except:
            print("    Error processing the chunk")
    processed_articles.append(article)
    update_processed_articles(processed_articles)


    

Number of articles not yet processed: 29


NEW ARTICLE: starting to work on https://www.streetpress.com/sujet/1697194029-civitas-catholique-integriste-continue-activites-malgre-dissolution-pelerinage
  chunks lengths: [5624]
  New chunk: chunk 1. Started at: 2023-12-14 11:44:14.808298


NEW ARTICLE: starting to work on https://www.streetpress.com/sujet/1695658083-eliot-bertin-nouveau-chef-ultra-violent-extreme-droite-lyonnaise
  chunks lengths: [7944, 5885]
  New chunk: chunk 1. Started at: 2023-12-14 11:49:54.079262
  New chunk: chunk 2. Started at: 2023-12-14 11:57:10.292641


NEW ARTICLE: starting to work on https://www.streetpress.com/sujet/1695652604-bar-huitres-prefere-extreme-droite-parisienne-conversano-zemmour
  chunks lengths: [7972, 520]
  New chunk: chunk 1. Started at: 2023-12-14 12:03:17.021262
  New chunk: chunk 2. Started at: 2023-12-14 12:05:18.631618


NEW ARTICLE: starting to work on https://www.streetpress.com/sujet/1695657250-bikers-neonazis-serge-ayoub-installer-n

In [82]:

response = requests.get("https://www.streetpress.com/rubriques/extreme-droite")
text = response.text
soup = BeautifulSoup(text, 'html.parser')

In [83]:
soup

<!DOCTYPE html>

<html data-n-head="%7B%22lang%22:%7B%22ssr%22:%22fr%22%7D%7D" data-n-head-ssr="" lang="fr">
<head>
<meta charset="utf-8" data-n-head="ssr"/><meta content="width=device-width, initial-scale = 1.0, maximum-scale=1.0, user-scalable=no" data-n-head="ssr" name="viewport"/><meta content="Actualité, info, paris, banlieue, streetpress, street press, magazine urbain, société, politique, site participatif" data-hid="keywords" data-n-head="ssr" name="keywords"/><meta content="StreetPress" data-hid="author" data-n-head="ssr" name="author"/><meta content="max-snippet:-1, max-image-preview:large, max-video-preview:-1" data-n-head="ssr" name="robots"/><meta content="943140015819496" data-hid="fb:app_id" data-n-head="ssr" property="fb:app_id"/><meta content="website" data-hid="og:type" data-n-head="ssr" property="og:type"/><meta content="StreetPress" data-hid="og:title" data-n-head="ssr" property="og:title"/><meta content="https://www.streetpress.com" data-hid="og:url" data-n-head="ss