# POV: You are a Policy Analyst at a Federal Agency. A new Executive Order about AI has been release and your Agency needs to know what are the action items and the timeline for each.

1) Find all the items associated to privacy.
2) Extract any action with a deadline.
Demos
3) Produce a summary of the document.
4) Produce a summary based on a specific topic.

Links:
- https://www.whitehouse.gov/briefing-room/presidential-actions/2023/10/30/executive-order-on-the-safe-secure-and-trustworthy-development-and-use-of-artificial-intelligence/
- https://www.whitehouse.gov/briefing-room/statements-releases/2023/10/30/fact-sheet-president-biden-issues-executive-order-on-safe-secure-and-trustworthy-artificial-intelligence/

In [1]:
import re
import requests
from bs4 import BeautifulSoup

from IPython.display import HTML
import matplotlib.pyplot as plt

import warnings
warnings.filterwarnings('ignore')

In [2]:
AIEO = 'https://www.whitehouse.gov/briefing-room/presidential-actions/2023/10/30/executive-order-on-the-safe-secure-and-trustworthy-development-and-use-of-artificial-intelligence/'

In [3]:
content = requests.get(AIEO)

In [4]:
content

<Response [200]>

In [5]:
soup = BeautifulSoup(content.text)

In [6]:
print(soup.prettify())

<!DOCTYPE html>
<html class="no-js alert__has-cookie" lang="en-US">
 <head>
  <meta charset="utf-8"/>
  <meta content="notranslate" name="google"/>
  <meta content="IE=edge" http-equiv="X-UA-Compatible"/>
  <meta content="width=device-width, initial-scale=1" name="viewport"/>
  <link href="https://gmpg.org/xfn/11" rel="profile"/>
  <!-- If you're reading this, we need your help building back better. https://usds.gov/ -->
  <meta content="index, follow, max-image-preview:large, max-snippet:-1, max-video-preview:-1" name="robots"/>
  <title>
   Executive Order on the Safe, Secure, and Trustworthy Development and Use of Artificial Intelligence | The White House
  </title>
  <link href="https://www.whitehouse.gov/briefing-room/presidential-actions/2023/10/30/executive-order-on-the-safe-secure-and-trustworthy-development-and-use-of-artificial-intelligence/" rel="canonical"/>
  <meta content="en_US" property="og:locale"/>
  <meta content="article" property="og:type"/>
  <meta content="Execut

In [7]:
soup.find('body').getText()

"\n\n\nSkip to content\n\n\nYou have JavaScript disabled. Please enable JavaScript to use this feature.\n\n\n\nToggle High Contrast\n\n\n\n\n\t\tToggle High Contrast\t\n\n\n\n\n\n\nToggle Large Font Size\n\n\n\n\n\t\tToggle Large Font Size\t\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\nThe White House\n\n\nThe White House\n \n\n\n\n\nThe White House\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\t\t\t\t\t\t\t\tHome\t\t\t\t\t\t\t\n\n\nAdministration\nPriorities\nThe Record\nBriefing Room\nEspañol\n \n\n\n\nInstagramOpens in a new window\nFacebookOpens in a new window\nTwitterOpens in a new window\nYouTubeOpens in a new window\n \n\n\nContact Us\nPrivacy Policy\nCopyright Policy\nAccessibility Statement\n \n\n\n\n\n\n\n\n\n\nMenu\nClose\n\n\n\n\n\n\n\n\nTo search this site, enter a search term\n\n \n\n\nSearch\n\n\n\n\n\n\nMobile Menu Overlay\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\nAdministration\n\nShow submenu for “Administration””\n\

In [8]:
soup.find('body').findAll('p')

[<p>
 <strong>The White House</strong><br/>
 								1600 Pennsylvania Ave NW<br/>
 								Washington, DC 20500
 							</p>,
 <p>     By the authority vested in me as President by the Constitution and the laws of the United States of America, it is hereby ordered as follows:</p>,
 <p>     Section 1.  Purpose.  Artificial intelligence (AI) holds extraordinary potential for both promise and peril.  Responsible AI use has the potential to help solve urgent challenges while making our world more prosperous, productive, innovative, and secure.  At the same time, irresponsible use could exacerbate societal harms such as fraud, discrimination, bias, and disinformation; displace and disempower workers; stifle competition; and pose risks to national security.  Harnessing AI for good and realizing its myriad benefits requires mitigating its substantial risks.  This endeavor demands a society-wide effort that includes government, the private sector, academia, and civil society.</p>,
 <p>     My

In [9]:
text = ''
for line in soup.find('body').findAll('p'):
    text = text + line.getText().encode('ascii', 'ignore').decode('ascii')

In [10]:
print(text)


The White House
								1600 Pennsylvania Ave NW
								Washington, DC 20500
							  By the authority vested in me as President by the Constitution and the laws of the United States of America, it is hereby ordered as follows:  Section 1. Purpose. Artificial intelligence (AI) holds extraordinary potential for both promise and peril. Responsible AI use has the potential to help solve urgent challenges while making our world more prosperous, productive, innovative, and secure. At the same time, irresponsible use could exacerbate societal harms such as fraud, discrimination, bias, and disinformation; displace and disempower workers; stifle competition; and pose risks to national security. Harnessing AI for good and realizing its myriad benefits requires mitigating its substantial risks. This endeavor demands a society-wide effort that includes government, the private sector, academia, and civil society.  My Administration places the highest urgency on governing the development and use 

sections = []

secText = ''

for line in soup.find('body').findAll('p'):
    if ('Sec. ' in line.getText()) or ('Section 1.' in line.getText()):
        sections.append(secText)
        secText = ''
        
    lineText = line.getText().encode('ascii', 'ignore').decode('ascii')
    
    secText = secText + lineText
sections.append(secText)
    #text = text + line.getText()

In [11]:
import nltk
from nltk.tokenize import sent_tokenize

# Find the portions of the text that mention a specific topic

In [12]:
def search_and_return_sentences(document, search_term):
    # Tokenize the document into sentences
    sentences = sent_tokenize(document)

    # Use regular expression to find sentences containing the search term
    matching_sentences = [sentence for sentence in sentences if re.search(r'\b{}\b'.format(re.escape(search_term)), sentence, flags=re.IGNORECASE)]

    return matching_sentences

In [13]:
search_term = "Privacy"
result_sentences = search_and_return_sentences(text, search_term)

# Display the sentences containing the search term
for sentence in result_sentences:
    print(sentence)
    print('\n')

The Federal Government will enforce existing consumer protection laws and principles and enact appropriate safeguards against fraud, unintended bias, discrimination, infringements on privacy, and other harms from AI.


(f) Americans privacy and civil liberties must be protected as AI continues advancing.


To combat this risk, the Federal Government will ensure that the collection, use, and retention of data is lawful, is secure, and mitigates privacy and confidentiality risks.


Agencies shall use available policy and technical tools, including privacy-enhancing technologies (PETs) where appropriate, to protect privacy and to combat the broader legal and societal risks  including the chilling of First Amendment rights  that result from the improper collection and use of peoples data.


(j) The term differential-privacy guarantee means protections that allow information about a group to be shared while provably limiting the improper access, use, or disclosure of personal information ab

# Find the portions of the text that mention a specific topic and highlight the topic

In [14]:
def highlight_search_term(sentence, search_term):
    # Use HTML to highlight the search term in the sentence
    highlighted_sentence = re.sub(r'\b{}\b'.format(re.escape(search_term)),
                                  '<span style="color: red; font-weight: bold;">{}</span>'.format(search_term),
                                  sentence, flags=re.IGNORECASE)
    return highlighted_sentence

In [15]:
def search_and_return_highlighted_sentences(document, search_term):
    # Tokenize the document into sentences
    sentences = sent_tokenize(document)

    # Filter and highlight the sentences that contain the search term
    matching_sentences = [highlight_search_term(sentence, search_term) for sentence in sentences if re.search(r'\b{}\b'.format(re.escape(search_term)), sentence, flags=re.IGNORECASE)]

    # Display the highlighted sentences as HTML
    display(HTML('<br><br>'.join(matching_sentences)))

In [16]:
search_term = "justice"
search_and_return_highlighted_sentences(text, search_term)

In [17]:
search_term = "privacy"
search_and_return_highlighted_sentences(text, search_term)

# Find time sensitive information

In [18]:
def extract_day_values(sentence):
    # Use regular expression to extract day values from the sentence
    matches = re.findall(r'\b(\d+)\s*days?\b', sentence, flags=re.IGNORECASE)
    return [int(match) for match in matches]

In [19]:
def highlight_additional_term(sentence, additional_term):
    # Use HTML to highlight the search term in the sentence
    highlighted_sentence = re.sub(r'\b{}\b'.format(re.escape(additional_term)),
                                  '<span style="color: blue; font-weight: bold;">{}</span>'.format(additional_term),
                                  sentence, flags=re.IGNORECASE)
    return highlighted_sentence

In [20]:
def search_and_return_highlighted_time_mandates(document, additional_term=None):
    # Tokenize the document into sentences
    sentences = sent_tokenize(document)

    # Extract and store day values along with sentences
    sentences_with_days = {}
    for sentence in sentences:
        day_values = extract_day_values(sentence)
        if day_values:
            for day_value in day_values:
                if day_value not in sentences_with_days:
                    sentences_with_days[day_value] = []
                sentences_with_days[day_value].append(sentence)
    # Sort day values in ascending order
    sorted_days = sorted(sentences_with_days.keys())

    # Display sentences for each day value
    for day_value in sorted_days:
        # Print the amount of days in bold
        display(HTML(f'<span style="font-weight: bold;">{day_value} day mandates:</span>'))

        # Display the sentences that contain the specified amount of days
        for sentence in sentences_with_days[day_value]:
            highlighted_sentence = highlight_search_term(sentence, f'{day_value} days')
            if additional_term:
                highlighted_sentence = highlight_additional_term(highlighted_sentence, additional_term)
            display(HTML(highlighted_sentence))

In [21]:
search_and_return_highlighted_time_mandates(text, 'responsible')

# Summarize Document as a whole, or summarized the results for a specific keyword

In [22]:
# https://scikit-learn.org/stable/modules/generated/sklearn.feature_extraction.text.TfidfVectorizer.html
from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.metrics.pairwise import cosine_similarity

In [23]:
def extractive_summarization_with_keyword(document, keyword=None, num_sentences=3):
    # Tokenize the document into sentences
    sentences = sent_tokenize(document)

    # Create TF-IDF matrix
    vectorizer = TfidfVectorizer()
    tfidf_matrix = vectorizer.fit_transform(sentences)

    # Calculate cosine similarity between sentences and keyword
    if keyword:
        keyword_vector = vectorizer.transform([keyword])
        similarity_scores = cosine_similarity(tfidf_matrix, keyword_vector)
        # Combine similarity scores with sentence scores
        combined_scores = similarity_scores.flatten()
        # Use PageRank algorithm to rank sentences based on combined scores
        ranked_sentences = [(score, sentence) for score, sentence in zip(combined_scores, sentences)]
        
    else:
        # Calculate cosine similarity between sentences
        similarity_matrix = cosine_similarity(tfidf_matrix, tfidf_matrix)
        # Use PageRank algorithm to rank sentences
        scores = similarity_matrix.sum(axis=1)
        ranked_sentences = [(score, sentence) for score, sentence in zip(scores, sentences)]
    
    
    ranked_sentences.sort(reverse=True)

    # Select the top N sentences for the summary
    summary_sentences = [sentence for _, sentence in ranked_sentences[:num_sentences]]
    summary = ' '.join(summary_sentences)

    return summary


In [24]:
# Set the keyword for the search
keyword_to_search = "responsible"

In [25]:
# Set the number of sentences for the summary
num_summary_sentences = 10

# Perform extractive summarization with keyword search
summary = extractive_summarization_with_keyword(text, keyword_to_search, num_sentences=num_summary_sentences)

In [26]:
# Display the summary
print("\nSummary (related to the keyword '{}'):\n".format(keyword_to_search), summary)


Summary (related to the keyword 'responsible'):
 (c) The responsible development and use of AI require a commitment to supporting American workers. (d) To help ensure the responsible development and deployment of AI in the education sector, the Secretary of Education shall, within 365 days of the date of this order, develop resources, policies, and guidance regarding AI. Responsible AI use has the potential to help solve urgent challenges while making our world more prosperous, productive, innovative, and secure. These resources shall address safe, responsible, and nondiscriminatory uses of AI in education, including the impact AI systems have on vulnerable and underserved communities, and shall be developed in consultation with stakeholders as appropriate. (ii) Within 90 days of the date of this order, the Secretary of Transportation shall direct appropriate Federal Advisory Committees of the DOT to provide advice on the safe and responsible use of AI in transportation. (b) Promoting