<a href="https://colab.research.google.com/github/jonbaer/googlecolab/blob/master/SEC_10_K_Filing_(Anthropic).ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

In [None]:
!pip install llama-index anthropic "rich<13.0.1" pdf2image pytesseract nest_asyncio

Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/


In [None]:
# download files
!mkdir data
!wget https://www.dropbox.com/s/948jr9cfs7fgj99/UBER.zip?dl=1 -O data/UBER.zip
!unzip data/UBER.zip -d data

mkdir: cannot create directory ‘data’: File exists
--2023-05-12 17:51:09--  https://www.dropbox.com/s/948jr9cfs7fgj99/UBER.zip?dl=1
Resolving www.dropbox.com (www.dropbox.com)... 162.125.5.18, 2620:100:601d:18::a27d:512
Connecting to www.dropbox.com (www.dropbox.com)|162.125.5.18|:443... connected.
HTTP request sent, awaiting response... 302 Found
Location: /s/dl/948jr9cfs7fgj99/UBER.zip [following]
--2023-05-12 17:51:09--  https://www.dropbox.com/s/dl/948jr9cfs7fgj99/UBER.zip
Reusing existing connection to www.dropbox.com:443.
HTTP request sent, awaiting response... 302 Found
Location: https://uc0857a17a901fcb03bde45b56d1.dl.dropboxusercontent.com/cd/0/get/B77zHtNAN-IWEkcJNF3pzM1nGQnXUIVMaJ7qOz8yHolk25dO9FYHhhDyxJ-uYlp4VNTU2dqrbFN0rYmwAQQSWW_Q3cNZiEAZ4w85_yHCcihRy_t4nK6NVz3O2_tl4QvJIyjrKBeal3lZuYat32gAXawGECeEAk4_dPcWNvisaUG29Q/file?dl=1# [following]
--2023-05-12 17:51:10--  https://uc0857a17a901fcb03bde45b56d1.dl.dropboxusercontent.com/cd/0/get/B77zHtNAN-IWEkcJNF3pzM1nGQnXUIVMaJ7qOz8

In [None]:
import os

os.environ["ANTHROPIC_API_KEY"] = ""

In [None]:
import logging
import sys

logging.basicConfig(stream=sys.stdout, level=logging.INFO)
logging.getLogger().addHandler(logging.StreamHandler(stream=sys.stdout))

from llama_index import (
    GPTVectorStoreIndex, 
    GPTListIndex, 
    SimpleDirectoryReader, 
    ServiceContext, 
    LLMPredictor, 
    PromptHelper
)
from langchain.llms import Anthropic

In [None]:
import nest_asyncio
nest_asyncio.apply()

In [None]:
# define LLM

# define prompt helper
# set maximum input size
max_input_size = 100000
# set number of output tokens
num_output = 2048
# set maximum chunk overlap
max_chunk_overlap = 20
prompt_helper = PromptHelper(max_input_size, num_output, max_chunk_overlap)

llm_predictor = LLMPredictor(llm=Anthropic(model="claude-v1.3-100k", temperature=0, max_tokens_to_sample=num_output))
service_context = ServiceContext.from_defaults(
    llm_predictor=llm_predictor, prompt_helper=prompt_helper,
    chunk_size_limit=95000
)



# Load in Data 

Through Unstructured.io

In [None]:
from llama_index import download_loader
from pathlib import Path

In [None]:
UnstructuredReader = download_loader("UnstructuredReader", refresh_cache=True)

In [None]:
loader = UnstructuredReader()
doc_set = {}
all_docs = []
years = [2022, 2021, 2020, 2019]
for year in years:
    year_doc = loader.load_data(file=Path(f'./data/UBER/UBER_{year}.html'), split_documents=False)[0]
    # insert year metadata into each year
    year_doc.extra_info = {"year": year}
    doc_set[year] = year_doc
    all_docs.append(year_doc)

[nltk_data] Downloading package punkt to /root/nltk_data...
[nltk_data]   Package punkt is already up-to-date!
[nltk_data] Downloading package averaged_perceptron_tagger to
[nltk_data]     /root/nltk_data...
[nltk_data]   Package averaged_perceptron_tagger is already up-to-
[nltk_data]       date!


In [None]:
doc_set[2019]

Document(text='UNITED STATES\n\nSECURITIES AND EXCHANGE COMMISSION\n\nWashington, D.C. 20549\n\n____________________________________________\n\nFORM\n\n10-K\n\n____________________________________________\n\n(Mark One)\n\nANNUAL REPORT PURSUANT TO SECTION 13 OR 15(d) OF THE SECURITIES EXCHANGE ACT OF 1934\n\nFor the fiscal year ended\n\nDecember 31, 2019\n\nOR\n\nTRANSITION REPORT PURSUANT TO SECTION 13 OR 15(d) OF THE SECURITIES EXCHANGE ACT OF 1934\n\nFor the transition period from_____ to _____\n\nCommission File Number:\n\n001-38902\n\n____________________________________________\n\nUBER TECHNOLOGIES, INC.\n\n(Exact name of registrant as specified in its charter)\n\n____________________________________________\n\nDelaware\n\n45-2647441\n\n(State or other jurisdiction of incorporation or organization)\n\n(I.R.S. Employer Identification No.)\n\n1455 Market Street, 4th Floor\n\nSan Francisco\n\nCalifornia\n\n94103\n\n(Address of principal executive offices, including zip code)\n\n415\

In [None]:
# just get 2019 documents
list_index = GPTListIndex.from_documents([doc_set[2019]], service_context=service_context)

In [None]:
# to fit within the 100k context window, there's two nodes
list_index.index_struct.nodes

['177da2d8-4bfb-4101-aa75-b9e846940533',
 '5b167f2c-37b3-4c8c-84a3-58b43f4a963b']

In [None]:
# NOTE: the default create/refine approach does not give good answers
query = "What were some of the biggest risk factors in 2019?"
query_engine = list_index.as_query_engine(service_context=service_context)
response = query_engine.query(query)

In [None]:
print(response)


Original answer:

Some of the biggest risk factors for Uber in 2019 included:

• Regulatory challenges and uncertainty. Uber faced regulatory challenges and uncertainty in many markets, including restrictions on its products and services, caps on pricing, and licensing requirements. For example, California's AB5 law and other similar laws increased the risk of Drivers being classified as employees. Uber also faced regulatory scrutiny and bans in London, Barcelona, and other markets.

• Competition. The markets in which Uber operates are highly competitive, and Uber faced significant competition from well-established and low-cost alternatives in 2019. Competitors also aggressively competed for Drivers and consumers by offering significant incentives and discounts. 

• Safety and security. There were risks related to the safety and security of Uber's platform, including risks from vehicle or scooter accidents, assaults, and other incidents. Uber released a safety report in 2019 detailin

In [None]:
# NOTE: tree_summarize gives better answers
query = "What were some of the biggest risk factors in 2019?"
query_engine = list_index.as_query_engine(service_context=service_context, response_mode="tree_summarize")
response = query_engine.query(query)

In [None]:
print(response)


• Regulatory challenges and uncertainty: Uber faced significant regulatory challenges and uncertainty in 2019, including AB5 in California which codified a new test for determining whether workers should be classified as employees or independent contractors. Uber also faced regulatory scrutiny and bans in other markets like London, UK. These regulatory issues created uncertainty and risk around Uber's business model and operations.

• Safety and security: Uber received negative publicity around safety incidents on its platform which could damage its brand and reputation. Uber released a safety report in 2019 on sexual assaults and other incidents which led to additional scrutiny. Safety and security risks remain an ongoing issue for Uber's business.

• Competition: The markets in which Uber competes are intensely competitive, and Uber faces competition from new and existing companies in the various segments it operates in like ridesharing, food delivery, and logistics. Increased compe

In [None]:
query = "What were some of the significant acquisitions?"
query_engine = list_index.as_query_engine(service_context=service_context, response_mode="tree_summarize")
response = query_engine.query(query)

In [None]:
print(response)


Uber Technologies, Inc. is a technology company that develops and operates proprietary technology applications in the United States and internationally. The company's platforms connect consumers with independent providers of ride services for ridesharing services, as well as connect consumers with restaurants and food delivery service providers for meal preparation and delivery services. Uber Technologies, Inc. was formerly known as Ubercab, Inc. and changed its name to Uber Technologies, Inc. in February 2011. The company was founded in 2009 and is headquartered in San Francisco, California.

Uber Technologies, Inc. operates through three segments:

•Mobility: Uber's Mobility segment connects consumers with independent providers of ride services for ridesharing services and other forms of transportation services, including public transit, bikes, scooters, and vehicle rentals.

•Delivery: Uber's Delivery segment connects consumers with restaurants and food delivery service providers f

In [None]:
list_index = GPTListIndex.from_documents(all_docs, service_context=service_context)
list_index.index_struct.nodes

['5cef434f-1d6a-4979-bc86-7d586f3dc483',
 '2a140528-7bb4-4372-b58e-f0b7f23c7a1f',
 '83439a3a-a040-42e2-806f-878d33d4554f',
 '9a943b4f-db23-4f69-9866-7c5c906d8255',
 '14c565b3-af6f-476b-83d4-c4ebe3dbabd2',
 '40087d34-4780-4911-b6d2-78c072d6fd5d',
 '98a7a515-2243-4019-8558-801701532fe4',
 '7e9414b5-2ff4-4ecb-9a2c-6586785b7ce2']

In [None]:
query = "How are the risk factors changing across years? Compare/contrast the risk factors across the SEC filings."
query_engine = list_index.as_query_engine(response_mode="tree_summarize", use_async=False)
response = query_engine.query(query)

In [None]:
print(response)


The risk factors disclosed in Uber's SEC filings have evolved over time based on Uber's business and industry changes. Some of the key differences in risk factors across the filings are:

2017 10-K:
- Focused heavily on risks related to negative publicity, competition, dependence on independent contractors, and regulatory challenges as Uber was still facing backlash from various PR crises and regulatory pushback. 
- Also highlighted risks from intellectual property litigation given various IP disputes at the time.

2018 10-K:
- Added more risks related to autonomous vehicles as Uber ramped up its self-driving car efforts. Specifically called out risks from accidents, technical challenges, and competition in the AV space.
- Removed some risks related to negative publicity and PR crises as those issues had subsided. But added risks related to corporate culture and workplace environment given the Fowler scandal.

2019 10-K: 
- Further expanded AV risks to include risks from partnerships 