# RAG

This work will look at the implementation of RAG within NHS England. This notebook contains a simple RAG pipeline which 

In [1]:
import glob
import os

import toml
from dotenv import load_dotenv


import src.models as models

from tqdm import tqdm

config = toml.load("config.toml")
load_dotenv(".secrets")
os.environ["ANTHROPIC_API_KEY"] = os.getenv("anthropic_key")

if config['DEV_MODE']:
    config['PERSIST_DIRECTORY'] += "/dev"


In [2]:
rag_pipeline = models.RagPipeline(config['EMBEDDING_MODEL'], config['PERSIST_DIRECTORY'])

  return self.fget.__get__(instance, owner)()


In [3]:
if (not config['DEV_MODE']):  # won't populate the database if in dev mode - we can just use what was already loaded.
    rag_pipeline.load_documents()

In [4]:
question = "Explain the main benefits of Reproducible Analytical Pipelines (RAP)"

result = rag_pipeline.answer_question(question, rag=False)

print(result)

  warn_deprecated(


- Verifiability: Reproducible pipelines allow others to easily verify your analysis methodology and reproduce your results, increasing confidence. 

- Replication: RAPs make it easy for others to replicate your analysis workflow on new data. A published data pipeline can simply be applied to new data rather than redeveloped.

- Automation: RAPs apply automation, reducing opportunities for human error. They also save labor by avoiding manual processes and rework.

- Auditability: Pipelines provide a record of the exact data sourcing, cleaning, modeling, etc. steps. This allows auditing data provenance and analysis decisions.

- Collaboration: Shared pipelines enable collaborative analysis development across teams and organizations. Different experts can contribute modules into an integrated workflow.  

- Reusability: Modular pipelines and workflows can be reused across analyses and projects, avoiding duplication of effort. Code, models, etc can be abstracted into reusable libraries and

In [5]:
rag_pipeline.retriever.get_relevant_documents("What is analytical best practice?")

[Document(page_content='which analysts can develop analytic code for subsequent execution against real data; or to examine for training purposes; or as a service to help new arrivals in a field evaluate the feasibility of using a given dataset for a given purpose.', metadata={'file_path': 'docs\\goldacre_review.txt'}),
 Document(page_content='The network was created to facilitate the sharing of best practice, provide consultancy services, build capability, create tools, guidance and standards, and monitor performance of the different analysis functions. The government analysis function Career Framework, for example, was collaboratively developed by all the analytical professions and is designed to describe typical analytical roles across government, including the main skills required to perform each role at varied skill level. Where specific skills are highlighted, the framework signposts to relevant training available such as that available for: data visualisation; communicating insig

In [11]:
from bs4 import BeautifulSoup

soup = """<!DOCTYPE html>


















<html lang="en">
  <head>
    <meta charset="utf-8">
    <meta http-equiv="X-UA-Compatible" content="IE=edge">
    <meta name="viewport" content="width=device-width, initial-scale=1, shrink-to-fit=no">
    <meta name="description" content="Read about bulimia nervosa, an eating disorder and mental health condition where someone is binge eating, then making themselves vomit or using laxatives to purge the food from their body">
    <meta name="google-site-verification" content="a4yrvgi5ZlBnKWfqFKkQ3_mEjqow_fpwbtF2bUTmZgc" />
    
    <link rel="canonical" href="https://www.nhs.uk/mental-health/conditions/bulimia/">

    

    <title>Bulimia - NHS</title>

    <link href="https://assets.nhs.uk/" rel="preconnect" crossorigin>
    <link type="font/woff2" href="https://assets.nhs.uk/fonts/FrutigerLTW01-55Roman.woff2" rel="preload" as="font" crossorigin>
    <link type="font/woff2" href="https://assets.nhs.uk/fonts/FrutigerLTW01-65Bold.woff2" rel="preload" as="font" crossorigin>

    <link rel="stylesheet" href="/static/nhsuk/css/main.e672d1de64f0.css" type="text/css" />

    
      
    

    
      <script type="application/ld+json">{"@context": "http://schema.org", "@type": "MedicalWebPage", "about": {"@type": "WebPage", "alternateName": "", "name": "Bulimia"}, "author": {"@type": "Organization", "email": "nhswebsite.servicedesk@nhs.net", "logo": "https://www.nhs.uk/nhscwebservices/documents/logo1.jpg", "name": "NHS website", "url": "https://www.nhs.uk"}, "breadcrumb": {"@context": "http://schema.org", "@type": "BreadcrumbList", "itemListElement": [{"@type": "ListItem", "item": {"@id": "https://www.nhs.uk/mental-health/", "genre": [], "name": "Mental health"}, "position": 0}, {"@type": "ListItem", "item": {"@id": "https://www.nhs.uk/mental-health/conditions/", "genre": [], "name": "Mental health conditions"}, "position": 1}, {"@type": "ListItem", "item": {"@id": "https://www.nhs.uk/mental-health/conditions/bulimia/", "genre": ["Condition"], "name": "Bulimia"}, "position": 2}]}, "copyrightHolder": {"@type": "Organization", "name": "Crown Copyright"}, "dateModified": "2022-03-29T10:26:45+00:00", "description": "Bulimia is where someone is binge eating, then making themselves vomit or using laxatives to purge the food from their body.", "genre": ["Condition"], "hasPart": [], "headline": "Bulimia", "keywords": "", "license": "https://developer.api.nhs.uk/terms", "name": "Bulimia", "schemaVersion": "http://schema.org/version/13.0/", "url": "https://www.nhs.uk/mental-health/conditions/bulimia/"}</script>
    

    
  


    <link rel="shortcut icon" href="/static/nhsuk/img/favicons/favicon.68c7f017cfba.ico" type="image/x-icon">
    <link rel="apple-touch-icon" href="/static/nhsuk/img/favicons/apple-touch-icon-180x180.15a5044def06.png">
    <link rel="mask-icon" href="/static/nhsuk/img/favicons/favicon.25bc75538faa.svg" color="#005eb8">
    <link rel="icon" sizes="192x192" href="/static/nhsuk/img/favicons/favicon-192x192.43924bfe6c7e.png">
    <meta name="msapplication-TileImage" content="/static/nhsuk/img/favicons/mediumtile-144x144.cf4985872492.png">
    <meta name="msapplication-TileColor" content="#005eb8">
    <meta name="msapplication-square70x70logo" content="/static/nhsuk/img/favicons/smalltile-70x70.29f75b06cf75.png">
    <meta name="msapplication-square150x150logo" content="/static/nhsuk/img/favicons/mediumtile-150x150.89688d93af5b.png">
    <meta name="msapplication-wide310x150logo" content="/static/nhsuk/img/favicons/widetile-310x150.535c3996630d.png">
    <meta name="msapplication-square310x310logo" content="/static/nhsuk/img/favicons/largetile-310x310.294742e00ff4.png">

    

    
      <meta property="og:url" content="https://www.nhs.uk/mental-health/conditions/bulimia/">
      <meta property="og:site_name" content="nhs.uk">
      <meta property="og:title" content="Bulimia"/>
      <meta property="og:description" content="Read about bulimia nervosa, an eating disorder and mental health condition where someone is binge eating, then making themselves vomit or using laxatives to purge the food from their body">
      <meta property="og:type" content="website">
      <meta property="og:locale" content="en_GB">
      <meta property="og:image" content="https://www.nhs.uk/static/nhsuk/img/default-social-image.a74435697f45.png">
      <meta property="og:image:alt" content="nhs.uk"/>
      <meta property="article:author" content="https://www.facebook.com/nhswebsite/">
      <meta property="article:modified_time" content="29 Mar 2022, 11:26 a.m.">
      <meta property="article:published_time" content="11 Feb 2021, 2:58 p.m.">
      
      <meta name="twitter:card" content="summary_large_image">
      <meta name="twitter:site" content="@nhsuk">
      <meta name="twitter:creator" content="@nhsuk">
      <meta name="twitter:image:alt" content="nhs.uk"/>
    


    <script src="/static/nhsuk/js/main.1d18ca98d926.js" defer></script>

    
      <script src="https://assets.nhs.uk/scripts/cookie-consent.js" defer></script>
    

    

    
      <script type="application/javascript">window.digitalData=
            {"page": {
                "pageInfo": {
                    "pageName": "nhs:web:mental-health:conditions:bulimia"
               },
                "category":
                    {
            "primaryCategory": "mental-health",
            "subCategory1":"conditions",
            "subCategory2":"bulimia",
            "subCategory3":""
            }
                },
               };
            </script>
      
      <script src="//assets.adobedtm.com/launch-ENe7f6cdd7cc05409b86547d9153429788.min.js" type="text/plain" data-cookieconsent="statistics" async></script>
    

    <script>
      window.NHSUK_SETTINGS = {};
      window.NHSUK_SETTINGS.BANNER_API_URL = "//www.nhs.uk/externalservices/surveyfeedapi/api/bannerfeed";
      window.NHSUK_SETTINGS.BANNER_TEST_API_URL = "//www.nhs.uk/externalservices/surveyfeedapi/api/testfeed";
      
      window.NHSUK_SETTINGS.SUGGESTIONS_TEST_HOST = "//api.nhs.uk/site-search/autocomplete";
      window.NHSUK_SETTINGS.SEARCH_TEST_HOST = "//nhs.uk/search/results";
      window.NHSUK_SETTINGS.USER_FEEDBACK_STORE_ENDPOINT = "https://nhsuk-user-feedback-func-prod-uks.azurewebsites.net/";
    </script>  

    
  </head>

  <body class="">
    <script>
      document.body.className = ((document.body.className) ? document.body.className + ' js-enabled' : 'js-enabled');
    </script>
    

    
      <a class="nhsuk-skip-link" href="#maincontent">Skip to main content</a>


      

<header class="nhsuk-header" role="banner">
  <div class="nhsuk-width-container nhsuk-header__container beta-header">
    

<div class="nhsuk-header__logo">
  <a class="nhsuk-header__link" href="/" aria-label="NHS homepage">
    
    <svg class="nhsuk-logo" xmlns="http://www.w3.org/2000/svg" viewBox="0 0 40 16" height="40" width="100">
      <path class="nhsuk-logo__background" fill="#005eb8" d="M0 0h40v16H0z"></path>
      <path class="nhsuk-logo__text" fill="#fff" d="M3.9 1.5h4.4l2.6 9h.1l1.8-9h3.3l-2.8 13H9l-2.7-9h-.1l-1.8 9H1.1M17.3 1.5h3.6l-1 4.9h4L25 1.5h3.5l-2.7 13h-3.5l1.1-5.6h-4.1l-1.2 5.6h-3.4M37.7 4.4c-.7-.3-1.6-.6-2.9-.6-1.4 0-2.5.2-2.5 1.3 0 1.8 5.1 1.2 5.1 5.1 0 3.6-3.3 4.5-6.4 4.5-1.3 0-2.9-.3-4-.7l.8-2.7c.7.4 2.1.7 3.2.7s2.8-.2 2.8-1.5c0-2.1-5.1-1.3-5.1-5 0-3.4 2.9-4.4 5.8-4.4 1.6 0 3.1.2 4 .6"></path>
    </svg>
    
    
  </a>
</div>


    
      <div class="nhsuk-header__content" id="content-header">
        <div class="nhsuk-header__search">
            <div class="nhsuk-header__search-wrap beta-header__search-wrap js-show" id="wrap-search">
              <form class="nhsuk-header__search-form beta-header__search-form" id="search" action="/search/" method="get" role="search">
                <label class="nhsuk-u-visually-hidden" for="search-field">Search the NHS website</label>
                <div class="autocomplete-container" id="autocomplete-container"></div>
                <input class="nhsuk-search__input" id="search-field" name="q" type="search" placeholder="Search" autocomplete="off" >
                <button class="nhsuk-search__submit beta-search__submit" type="submit">
                  <svg class="nhsuk-icon nhsuk-icon__search beta-icon__search" xmlns="http://www.w3.org/2000/svg" viewBox="0 0 24 24" aria-hidden="true" focusable="false">
                    <path d="M19.71 18.29l-4.11-4.1a7 7 0 1 0-1.41 1.41l4.1 4.11a1 1 0 0 0 1.42 0 1 1 0 0 0 0-1.42zM5 10a5 5 0 1 1 5 5 5 5 0 0 1-5-5z"></path>
                  </svg>
                  <span class="nhsuk-u-visually-hidden">Search</span>
                </button>
              </form>
            </div>
          </div>
      </div>
    

  </div>

  
    
    <div class="beta-nhsuk-navigation-container">
      <div class="nhsuk-width-container">
        <nav class="beta-nhsuk-navigation" id="header-navigation" role="navigation" aria-label="Primary navigation">
          <ul class="beta-nhsuk-header__navigation-list">
            <li class="beta-nhsuk-header__navigation-item">
              <a class="nhsuk-header__navigation-link beta-nhsuk-header__navigation-link"  href="/conditions/">
                Health A-Z
                <svg class="nhsuk-icon nhsuk-icon__chevron-right" xmlns="http://www.w3.org/2000/svg" viewBox="0 0 24 24" aria-hidden="true" height="34" width="34">
                  <path d="M15.5 12a1 1 0 0 1-.29.71l-5 5a1 1 0 0 1-1.42-1.42l4.3-4.29-4.3-4.29a1 1 0 0 1 1.42-1.42l5 5a1 1 0 0 1 .29.71z"></path>
                </svg>
              </a>
            </li>
            <li class="beta-nhsuk-header__navigation-item beta-nhsuk-header__navigation-item--services-mobile">
              <a class="nhsuk-header__navigation-link beta-nhsuk-header__navigation-link"  href="/nhs-services/">
                NHS services
                <svg class="nhsuk-icon nhsuk-icon__chevron-right" xmlns="http://www.w3.org/2000/svg" viewBox="0 0 24 24" aria-hidden="true" height="34" width="34">
                  <path d="M15.5 12a1 1 0 0 1-.29.71l-5 5a1 1 0 0 1-1.42-1.42l4.3-4.29-4.3-4.29a1 1 0 0 1 1.42-1.42l5 5a1 1 0 0 1 .29.71z"></path>
                </svg>
              </a>
            </li>
            <li class="beta-nhsuk-header__navigation-item">
              <a class="nhsuk-header__navigation-link beta-nhsuk-header__navigation-link"  href="/live-well/">
                Live Well
                <svg class="nhsuk-icon nhsuk-icon__chevron-right" xmlns="http://www.w3.org/2000/svg" viewBox="0 0 24 24" aria-hidden="true" height="34" width="34">
                  <path d="M15.5 12a1 1 0 0 1-.29.71l-5 5a1 1 0 0 1-1.42-1.42l4.3-4.29-4.3-4.29a1 1 0 0 1 1.42-1.42l5 5a1 1 0 0 1 .29.71z"></path>
                </svg>
              </a>
            </li>
            <li class="beta-nhsuk-header__navigation-item">
              <a class="nhsuk-header__navigation-link beta-nhsuk-header__navigation-link"  href="/mental-health/">
                Mental health
                <svg class="nhsuk-icon nhsuk-icon__chevron-right" xmlns="http://www.w3.org/2000/svg" viewBox="0 0 24 24" aria-hidden="true" height="34" width="34">
                  <path d="M15.5 12a1 1 0 0 1-.29.71l-5 5a1 1 0 0 1-1.42-1.42l4.3-4.29-4.3-4.29a1 1 0 0 1 1.42-1.42l5 5a1 1 0 0 1 .29.71z"></path>
                </svg>
              </a>
            </li>
            <li class="beta-nhsuk-header__navigation-item">
              <a class="nhsuk-header__navigation-link beta-nhsuk-header__navigation-link"  href="/conditions/social-care-and-support-guide/">
                Care and support
                <svg class="nhsuk-icon nhsuk-icon__chevron-right" xmlns="http://www.w3.org/2000/svg" viewBox="0 0 24 24" aria-hidden="true" height="34" width="34">
                  <path d="M15.5 12a1 1 0 0 1-.29.71l-5 5a1 1 0 0 1-1.42-1.42l4.3-4.29-4.3-4.29a1 1 0 0 1 1.42-1.42l5 5a1 1 0 0 1 .29.71z"></path>
                </svg>
              </a>
            </li>
            <li class="beta-nhsuk-header__navigation-item">
              <a class="nhsuk-header__navigation-link beta-nhsuk-header__navigation-link"  href="/pregnancy/">
                Pregnancy
                <svg class="nhsuk-icon nhsuk-icon__chevron-right" xmlns="http://www.w3.org/2000/svg" viewBox="0 0 24 24" aria-hidden="true" height="34" width="34">
                  <path d="M15.5 12a1 1 0 0 1-.29.71l-5 5a1 1 0 0 1-1.42-1.42l4.3-4.29-4.3-4.29a1 1 0 0 1 1.42-1.42l5 5a1 1 0 0 1 .29.71z"></path>
                </svg>
              </a>
            </li>
            <li class="beta-nhsuk-header__navigation-item beta-nhsuk-header__navigation-item--home">
              <a class="nhsuk-header__navigation-link beta-nhsuk-header__navigation-link"  href="/">
                Home
                <svg class="nhsuk-icon nhsuk-icon__chevron-right" xmlns="http://www.w3.org/2000/svg" viewBox="0 0 24 24" aria-hidden="true" height="34" width="34">
                  <path d="M15.5 12a1 1 0 0 1-.29.71l-5 5a1 1 0 0 1-1.42-1.42l4.3-4.29-4.3-4.29a1 1 0 0 1 1.42-1.42l5 5a1 1 0 0 1 .29.71z"></path>
                </svg>
              </a>
            </li>
            <li class="beta-nhsuk-header__navigation-item beta-nhsuk-header__navigation-item--services">
              <a class="nhsuk-header__navigation-link beta-nhsuk-header__navigation-link"  href="/nhs-services/">
                NHS services
                <svg class="nhsuk-icon nhsuk-icon__chevron-right" xmlns="http://www.w3.org/2000/svg" viewBox="0 0 24 24" aria-hidden="true" height="34" width="34">
                  <path d="M15.5 12a1 1 0 0 1-.29.71l-5 5a1 1 0 0 1-1.42-1.42l4.3-4.29-4.3-4.29a1 1 0 0 1 1.42-1.42l5 5a1 1 0 0 1 .29.71z"></path>
                </svg>
              </a>
            </li>
            <li class="beta-mobile-menu-container">
              <button class="beta-nhsuk-header__menu-toggle nhsuk-header__navigation-link beta-nhsuk-header__navigation-link" aria-expanded="false">
                <span class="nhsuk-u-visually-hidden">Browse</span>
                More
                <svg class="nhsuk-icon beta-nhsuk-icon__chevron-down" xmlns="http://www.w3.org/2000/svg" viewBox="0 0 24 24" aria-hidden="true" focusable="false">
                  <path d="M15.5 12a1 1 0 0 1-.29.71l-5 5a1 1 0 0 1-1.42-1.42l4.3-4.29-4.3-4.29a1 1 0 0 1 1.42-1.42l5 5a1 1 0 0 1 .29.71z"></path>
                </svg>

              </button>
            </li>
          </ul>
        </nav>
      </div>
    </div>

  
</header>
    

    
  



<nav class="nhsuk-breadcrumb beta-breadcrumb" aria-label="Breadcrumb">
  <div class="nhsuk-width-container">
    <ol class="nhsuk-breadcrumb__list">
      
      <li class="nhsuk-breadcrumb__item">
          <a href="/" class="nhsuk-breadcrumb__link">Home</a>
        </li>
        
      <li class="nhsuk-breadcrumb__item">
          <a href="/mental-health/" class="nhsuk-breadcrumb__link">Mental health</a>
        </li>
        
      <li class="nhsuk-breadcrumb__item">
          <a href="/mental-health/conditions/" class="nhsuk-breadcrumb__link">Mental health conditions</a>
        </li>
        
      </ol>

      <p class="nhsuk-breadcrumb__back">
        
          <a href="/mental-health/conditions/" class="nhsuk-breadcrumb__backlink">
            <span class="nhsuk-u-visually-hidden">Back to </span>
            Mental health conditions
          </a>
        
      </p>
    </div>
  </nav>




    

    <div class="nhsuk-width-container">
        <main class="nhsuk-main-wrapper nhsuk-u-padding-top-0 nhsuk-u-padding-top-0" id="maincontent"  lang="en-GB"  >
        
<div class="nhsuk-grid-row">
  <div class="nhsuk-grid-column-full">

    <div class="nhsuk-u-reading-width">

      
  <h1>
    
      Bulimia
    
  </h1>
  <p class="nhsuk-lede-text">
    Bulimia is where someone is binge eating, then making themselves vomit or using laxatives to purge the food from their body.
  </p>


    </div>

    <article>

      

  
  
  
  
  

  
    
      <section class="nhsuk-u-reading-width">
        <ul class="nhsuk-hub-key-links beta-hub-key-links">
          
            <li class="nhsuk-hub-key-links__list-item beta-hub-key-links__list-item">
              <a href="https://www.nhs.uk/mental-health/conditions/bulimia/overview/">
                Overview - Bulimia
              </a>
            </li>
          
            <li class="nhsuk-hub-key-links__list-item beta-hub-key-links__list-item">
              <a href="https://www.nhs.uk/mental-health/conditions/bulimia/symptoms/">
                Symptoms - Bulimia
              </a>
            </li>
          
            <li class="nhsuk-hub-key-links__list-item beta-hub-key-links__list-item">
              <a href="https://www.nhs.uk/mental-health/conditions/bulimia/treatment/">
                Treatment - Bulimia
              </a>
            </li>
          
        </ul>
      </section>
    
  

  <div class="nhsuk-u-reading-width nhsuk-u-margin-top-6 nhsuk-hub-after-key-links">
    
  </div>

  



    </article>

  </div>
</div>

        
      </main>
    </div>

    

    
      


<footer role="contentinfo">
  <div class="nhsuk-footer" id="nhsuk-footer">
    <div class="nhsuk-width-container">
      
      <h2 class="nhsuk-u-visually-hidden">Support links</h2>
      <div class="beta-nhsuk-footer">
        <ul class="beta-nhsuk-footer__list nhsuk-footer__list">
          <li class="beta-nhsuk-footer__list-item nhsuk-footer__list-item"><a class="beta-nhsuk-footer__list-item-link nhsuk-footer__list-item-link" href="/">Home</a></li>
          <li class="beta-nhsuk-footer__list-item nhsuk-footer__list-item"><a class="beta-nhsuk-footer__list-item-link nhsuk-footer__list-item-link" href="/conditions/">Health A to Z</a></li>
          <li class="beta-nhsuk-footer__list-item nhsuk-footer__list-item"><a class="beta-nhsuk-footer__list-item-link nhsuk-footer__list-item-link" href="/live-well/">Live Well</a></li>
          <li class="beta-nhsuk-footer__list-item nhsuk-footer__list-item"><a class="beta-nhsuk-footer__list-item-link nhsuk-footer__list-item-link" href="/mental-health/">Mental health</a></li>
          <li class="beta-nhsuk-footer__list-item nhsuk-footer__list-item"><a class="beta-nhsuk-footer__list-item-link nhsuk-footer__list-item-link" href="/conditions/social-care-and-support-guide/">Care and support</a></li>
          <li class="beta-nhsuk-footer__list-item nhsuk-footer__list-item"><a class="beta-nhsuk-footer__list-item-link nhsuk-footer__list-item-link" href="/pregnancy/">Pregnancy</a></li>
          <li class="beta-nhsuk-footer__list-item nhsuk-footer__list-item"><a class="beta-nhsuk-footer__list-item-link nhsuk-footer__list-item-link" href="/nhs-services/">NHS services</a></li>
          <li class="beta-nhsuk-footer__list-item nhsuk-footer__list-item"><a class="beta-nhsuk-footer__list-item-link nhsuk-footer__list-item-link" href="/conditions/coronavirus-covid-19/">Coronavirus (COVID-19)</a></li>
        </ul>

        <ul class="nhsuk-footer__list beta-nhsuk-footer__list">
          <li class="beta-nhsuk-footer__list-item nhsuk-footer__list-item"><a class="beta-nhsuk-footer__list-item-link nhsuk-footer__list-item-link" href="/nhs-app/">NHS App</a></li>
          <li class="beta-nhsuk-footer__list-item nhsuk-footer__list-item"><a class="beta-nhsuk-footer__list-item-link nhsuk-footer__list-item-link" href="/nhs-services/online-services/find-nhs-number/">Find my NHS number</a></li>
          <li class="beta-nhsuk-footer__list-item nhsuk-footer__list-item"><a class="beta-nhsuk-footer__list-item-link nhsuk-footer__list-item-link" href="/nhs-services/gps/view-your-gp-health-record/">View your GP health record</a></li>
          <li class="beta-nhsuk-footer__list-item nhsuk-footer__list-item"><a class="beta-nhsuk-footer__list-item-link nhsuk-footer__list-item-link" href="/using-the-nhs/about-the-nhs/">About the NHS</a></li>
          <li class="beta-nhsuk-footer__list-item nhsuk-footer__list-item"><a class="beta-nhsuk-footer__list-item-link nhsuk-footer__list-item-link" href="/using-the-nhs/healthcare-abroad/apply-for-a-free-uk-global-health-insurance-card-ghic/">Healthcare abroad</a></li>
        </ul>

        <ul class="nhsuk-footer__list beta-nhsuk-footer__list">
          <li class="beta-nhsuk-footer__list-item nhsuk-footer__list-item"><a class="beta-nhsuk-footer__list-item-link nhsuk-footer__list-item-link" href="/contact-us/">Contact us</a></li>
          <li class="beta-nhsuk-footer__list-item nhsuk-footer__list-item"><a class="beta-nhsuk-footer__list-item-link nhsuk-footer__list-item-link" href="/nhs-sites/">Other NHS websites</a></li>
          <li class="beta-nhsuk-footer__list-item nhsuk-footer__list-item"><a class="beta-nhsuk-footer__list-item-link nhsuk-footer__list-item-link" href="/our-policies/profile-editor-login/">Profile editor login</a></li>
        </ul>
        
        <ul class="nhsuk-footer__list beta-nhsuk-footer__list beta-nhsuk-footer__list-policies">
          <li class="beta-nhsuk-footer__list-item nhsuk-footer__list-item"><a class="beta-nhsuk-footer__list-item-link nhsuk-footer__list-item-link" href="/about-us/">About us</a></li>
          <li class="beta-nhsuk-footer__list-item nhsuk-footer__list-item"><a class="beta-nhsuk-footer__list-item-link nhsuk-footer__list-item-link" href="/accessibility-statement/">Accessibility statement</a></li>
          <li class="beta-nhsuk-footer__list-item nhsuk-footer__list-item"><a class="beta-nhsuk-footer__list-item-link nhsuk-footer__list-item-link" href="/our-policies/">Our policies</a></li>
          <li class="beta-nhsuk-footer__list-item nhsuk-footer__list-item"><a class="beta-nhsuk-footer__list-item-link nhsuk-footer__list-item-link" href="/our-policies/cookies-policy/">Cookies</a></li>
        </ul>
      </div>

      <div>
        <p class="beta-nhsuk-footer__copyright">&copy; Crown copyright</p>
      </div>

    </div>
  </div>
</footer>

    

    
      
      
        
          <script src="https://assets.nhs.uk/scripts/login.js"></script>
        
      
    
  <script defer src="https://static.cloudflareinsights.com/beacon.min.js/v84a3a4012de94ce1a686ba8c167c359c1696973893317" integrity="sha512-euoFGowhlaLqXsPWQ48qSkBSCFs3DPRyiwVu3FjR96cMPx+Fr+gpWRhIafcHwqwCqWS42RZhIudOvEI+Ckf6MA==" data-cf-beacon='{"rayId":"84a8bb4bbdba79c4","version":"2024.1.0","token":"08a46637120a404a963395dd86986b4f"}' crossorigin="anonymous"></script>
</body>
</html>
"""

In [None]:
soup3 = BeautifulSoup(requests.get("https://www.nhs.uk/mental-health/conditions/skin-picking-disorder/").text, "html.parser")
json.loads(soup2.find('script').text)#.get('mainEntityOfPage')

In [110]:
json.loads(soup2.find('script').text)

{'@context': 'http://schema.org',
 '@type': 'MedicalWebPage',
 'about': {'@type': 'WebPage',
  'alternateName': '',
  'name': 'Skin picking disorder'},
 'author': {'@type': 'Organization',
  'email': 'nhswebsite.servicedesk@nhs.net',
  'logo': 'https://www.nhs.uk/nhscwebservices/documents/logo1.jpg',
  'name': 'NHS website',
  'url': 'https://www.nhs.uk'},
 'breadcrumb': {'@context': 'http://schema.org',
  '@type': 'BreadcrumbList',
  'itemListElement': [{'@type': 'ListItem',
    'item': {'@id': 'https://www.nhs.uk/mental-health/',
     'genre': [],
     'name': 'Mental health'},
    'position': 0},
   {'@type': 'ListItem',
    'item': {'@id': 'https://www.nhs.uk/mental-health/conditions/',
     'genre': [],
     'name': 'Mental health conditions'},
    'position': 1},
   {'@type': 'ListItem',
    'item': {'@id': 'https://www.nhs.uk/mental-health/conditions/skin-picking-disorder/',
     'genre': ['Condition'],
     'name': 'Skin picking disorder'},
    'position': 2}]},
 'copyrightHold

In [109]:
from io import StringIO

def extract_text_from_main_entity(ent, text=None, key='mainEntityOfPage'):
    if text is None:
        text = StringIO()
    for elt in ent:
        nested = elt.get(key)
        if elt.get("name") == "markdown":
            text.write(elt.get("text"))
        if isinstance(nested, list):
            extract_text_from_main_entity(nested, text)
    return text

soup2 = BeautifulSoup(requests.get("https://www.nhs.uk/mental-health/conditions/skin-picking-disorder/").text, "html.parser")

extract_text_from_main_entity(json.loads(soup2.find('script').text))

AttributeError: 'str' object has no attribute 'get'

In [105]:
json.loads(soup2.find('script').text)#.get('mainEntityOfPage')

{'@context': 'http://schema.org',
 '@type': 'MedicalWebPage',
 'about': {'@type': 'WebPage',
  'alternateName': '',
  'name': 'Skin picking disorder'},
 'author': {'@type': 'Organization',
  'email': 'nhswebsite.servicedesk@nhs.net',
  'logo': 'https://www.nhs.uk/nhscwebservices/documents/logo1.jpg',
  'name': 'NHS website',
  'url': 'https://www.nhs.uk'},
 'breadcrumb': {'@context': 'http://schema.org',
  '@type': 'BreadcrumbList',
  'itemListElement': [{'@type': 'ListItem',
    'item': {'@id': 'https://www.nhs.uk/mental-health/',
     'genre': [],
     'name': 'Mental health'},
    'position': 0},
   {'@type': 'ListItem',
    'item': {'@id': 'https://www.nhs.uk/mental-health/conditions/',
     'genre': [],
     'name': 'Mental health conditions'},
    'position': 1},
   {'@type': 'ListItem',
    'item': {'@id': 'https://www.nhs.uk/mental-health/conditions/skin-picking-disorder/',
     'genre': ['Condition'],
     'name': 'Skin picking disorder'},
    'position': 2}]},
 'copyrightHold

In [82]:
import json

json.loads(BeautifulSoup(soup).script.text)['dateModified']

'2022-03-29T10:26:45+00:00'

['html',
 <html lang="en">
 <head>
 <meta charset="utf-8"/>
 <meta content="IE=edge" http-equiv="X-UA-Compatible"/>
 <meta content="width=device-width, initial-scale=1, shrink-to-fit=no" name="viewport"/>
 <meta content="Read about bulimia nervosa, an eating disorder and mental health condition where someone is binge eating, then making themselves vomit or using laxatives to purge the food from their body" name="description"/>
 <meta content="a4yrvgi5ZlBnKWfqFKkQ3_mEjqow_fpwbtF2bUTmZgc" name="google-site-verification"/>
 <link href="https://www.nhs.uk/mental-health/conditions/bulimia/" rel="canonical"/>
 <title>Bulimia - NHS</title>
 <link crossorigin="" href="https://assets.nhs.uk/" rel="preconnect"/>
 <link as="font" crossorigin="" href="https://assets.nhs.uk/fonts/FrutigerLTW01-55Roman.woff2" rel="preload" type="font/woff2"/>
 <link as="font" crossorigin="" href="https://assets.nhs.uk/fonts/FrutigerLTW01-65Bold.woff2" rel="preload" type="font/woff2"/>
 <link href="/static/nhsuk/css/

In [81]:
page_html = str(BeautifulSoup(soup))

for h in BeautifulSoup(soup).find_all('li'):
    if 'nhsuk-hub-key-links__list-item' in h.attrs['class']:
        a = h.find('a')
        print(a.get('href'))

        page_html += "/n"
        page_html += str(BeautifulSoup(requests.get(a.get('href')).text))
        

page_html
        

https://www.nhs.uk/mental-health/conditions/bulimia/overview/
https://www.nhs.uk/mental-health/conditions/bulimia/symptoms/
https://www.nhs.uk/mental-health/conditions/bulimia/treatment/




In [54]:
print(BeautifulSoup(requests.get('https://www.nhs.uk/mental-health/conditions/bulimia/overview/').text).find(id='maincontent').text.replace('\n\n', '').replace("  ", ""))


Overview - Bulimia


Bulimia is an eating disorder and mental health condition.People who have bulimia go through periods where they eat a lot of food in a very short amount of time (binge eating) and then purge the food from their body to try to stop themselves gaining weight.Purging could include making themselves vomit, using laxatives (medicine to help them poo) or diuretics (medicine that makes you pee more), fasting or doing excessive exercise, or a combination of these.Anyone can get bulimia, but it is more common in young people aged 15 to 25.
Getting help for bulimiaGetting help and support as soon as possible gives you the best chance of recovering from bulimia.If you think you may have bulimia, see a GP as soon as you can.They'll ask you questions about your eating habits and how you're feeling, and will check your overall health and weight.If they think you may have bulimia or another eating disorder, they should refer you to an eating disorder specialist or team of specia

In [52]:

import requests
print(BeautifulSoup(requests.get('https://www.nhs.uk/mental-health/conditions/bulimia/overview/').text).text.replace('\n\n', ''))

Overview - Bulimia - NHS
Skip to main content
Search the NHS website
Search
                Health A-Z
                
                NHS services
                
                Live Well
                
                Mental health
                
                Care and support
                
                Pregnancy
                
                Home
                
                NHS services
                
Browse
                More
                Home
Mental health
Mental health conditions
Bulimia
Back to 
            Bulimia
          
    
      Overview - Bulimia
    
  
Bulimia is an eating disorder and mental health condition.People who have bulimia go through periods where they eat a lot of food in a very short amount of time (binge eating) and then purge the food from their body to try to stop themselves gaining weight.Purging could include making themselves vomit, using laxatives (medicine to help them poo) or diuretics (medicine that makes you pee mor