## Install dependencies


In [1]:
!pip install -r ./requirements.txt -q

In [9]:
#!pip show langchain

Name: langchain
Version: 0.0.203
Summary: Building applications with LLMs through composability
Home-page: https://www.github.com/hwchase17/langchain
Author: 
Author-email: 
License: MIT
Location: /Users/Meera/miniconda3/lib/python3.10/site-packages
Requires: aiohttp, async-timeout, dataclasses-json, langchainplus-sdk, numexpr, numpy, openapi-schema-pydantic, pydantic, PyYAML, requests, SQLAlchemy, tenacity
Required-by: 


In [2]:
import os
from dotenv import load_dotenv, find_dotenv
load_dotenv(find_dotenv(), override=True)

True

In [3]:
pip install pypdf -q

Note: you may need to restart the kernel to use updated packages.


## Load the PDF Document

In [4]:
def load_document(file):
    from langchain.document_loaders import PyPDFLoader
    print(f'Loading {file}')
    loader = PyPDFLoader(file)
    data = loader.load()
    return data
    

In [5]:
data = load_document('data/Students-For-Fair-Admission-vs-Hardvard.pdf')
#print( data[1].page_content)
print(f'You have {len(data)} pages in your data' ) 

Loading data/Students-For-Fair-Admission-vs-Hardvard.pdf
You have 237 pages in your data


##  Large PDF document into Chunks

In [30]:
def chunk_data(data, chunk_size=1256):
    from langchain.text_splitter import RecursiveCharacterTextSplitter
    text_splitter = RecursiveCharacterTextSplitter(chunk_size=chunk_size, chunk_overlap=10)
    chunks = text_splitter.split_documents(data)
    return chunks
    
    

In [31]:
chunks = chunk_data( data)
print(f'You have {len(chunks)} chunks')
print(chunks[0])

You have 492 chunks
page_content='1 (Slip Opinion) OCTOBER TERM, 2022 \nSyllabus \nNOTE: Where it is feasible, a syllabus (headnote) will be released, as is \nbeing done in connection with this case, at the time the opinion is issued. \nThe syllabus constitutes no part of the opinion of the Court but has been prepared by the Reporter of Decisions for the convenience of the reader. See United States  v. Detroit Timber & Lumber Co.,  200 U. S. 321, 337. \nSUPREME COURT OF THE UNITED STATES \nSyllabus \nSTUDENTS FOR FAIR ADMISSIONS, INC. v. \nPRESIDENT AND FELLOWS OF HARVARD COLLEGE \nCERTIORARI TO THE UNITED STATES COURT OF APPEALS FOR \nTHE FIRST CIRCUIT \nNo. 20–1199. Argued October 31, 2022—Decided June 29, 2023* \nHarvard College and the University of North Carolina (UNC) are two of\nthe oldest institutions of higher le arning in the United States.  Every\nyear, tens of thousands of students apply to each school; many fewer \nare admitted.  Both Harvard and UNC employ a highly select

In [32]:
print(chunks[1])

page_content='sented is whether the admissions systems used by Harvard College \nand UNC are lawful under the Equal Protection Clause of the Four-\nteenth Amendment. \nAt Harvard, each application for admission is initially screened by a\n“first reader,” who assigns a numerical  score in each of six categories: \nacademic, extracurricular, athletic, school support, personal, and over-\nall.  For the “overall” category—a composite of the five other ratings— \na first reader can and does consider the applicant’s race.  Harvard’s admissions subcommittees then review  all applications from a partic-\nular geographic area.  These regi onal subcommittees make recommen-\ndations to the full admissions committ ee, and they take an applicant’s \nrace into account.  When the 40-member full admissions committee begins its deliberations, it discusses the relative breakdown of appli-cants by race.  The goal of the process, according to Harvard’s director \nof admissions, is ensuring there is no “dr

##  Inserting the embedding into Pinecode 

In [None]:
# Todo
def print_embedding_cost(texts):
    import tiktoken
    enc = tiktoken.encoding_for_model('text-embedding-ada-002')
    total_tokens = sum()

In [33]:
from langchain.embeddings import OpenAIEmbeddings
embeddings = OpenAIEmbeddings()

In [34]:
import os
import pinecone
from langchain.vectorstores import Pinecone

pinecone.init( api_key= os.environ.get('PINECONE_API_KEY'), environment=os.environ.get('PINECONE_ENV'))

In [35]:
# Create index
index_name = "supreme-court-affirmative-action"
pinecone.create_index( index_name, dimension=1536, metric='cosine')

In [39]:
vector_store = Pinecone.from_documents(chunks, embeddings, index_name= index_name)

In [40]:
query = 'What case was discussed in this report?'
result = vector_store.similarity_search(query)
print(result)

[Document(page_content='of the case in No. 20–1199, and issues this opinion with respect to the \ncase in No. 21–707.', metadata={'page': 208.0, 'source': 'data/Students-For-Fair-Admission-vs-Hardvard.pdf'}), Document(page_content='of the case in No. 20–1199 and joins this opinion only as it applies to the \ncase in No. 21–707.', metadata={'page': 139.0, 'source': 'data/Students-For-Fair-Admission-vs-Hardvard.pdf'}), Document(page_content='recited in a dissenting opinion in a different case decided almost a decade ago. Post,  at 29–30, n. 25 (opinion of S\nOTOMAYOR , J.); see also post,  at \n18–21 (opinion of S OTOMAYOR , J.) (further venturing beyond the trial rec -\nords to discuss data about employ ment, income, wealth, home owner -\nship, and healthcare).', metadata={'page': 119.0, 'source': 'data/Students-For-Fair-Admission-vs-Hardvard.pdf'}), Document(page_content='at 418 (opinion of Stevens, J.).', metadata={'page': 130.0, 'source': 'data/Students-For-Fair-Admission-vs-Hardvard

In [26]:
for r in result: 
    print(r.page_content)
    print('-' * 50)

cision of the case in No. 20–1199.
--------------------------------------------------
sion of the case in No. 20–1199.
--------------------------------------------------
32, n. 27 (opinion of S
OTOMAYOR , J.); cf. post, at 17 
(JACKSON , J., dissenting).  But the question in these cases
--------------------------------------------------
being done in connection with this case, at the time the opinion is issued.
--------------------------------------------------


## Ask Questions

In [41]:
from langchain.chains import RetrievalQA
from langchain.chat_models import ChatOpenAI

llm = ChatOpenAI( model='gpt-3.5-turbo', temperature=1)
retriever = vector_store.as_retriever( search_type='similarity', search_kwargs={'k': 13})
print(retriever)

chain = RetrievalQA.from_chain_type(llm=llm, chain_type="stuff", retriever=retriever)

vectorstore=<langchain.vectorstores.pinecone.Pinecone object at 0x7fca263850c0> search_type='similarity' search_kwargs={'k': 13}


In [42]:
query = "What case was discussed in the report"
answer= chain.run(query)
print(answer)

The report discusses two cases: No. 20-1199, involving Students for Fair Admissions, Inc. v. President and Fellows of Harvard College, and No. 21-707, involving Students for Fair Admissions, Inc. v. University of North Carolina, et al.


In [43]:
query = "Who is plaintiff in the case"
answer = chain.run(query)
print(answer)

The plaintiff in the case is Students for Fair Admissions, Inc. (SFFA).


In [21]:
query = "What is major takeaway from the report"
answer = chain.run(query)
print(answer)

The major takeaway from the report is that race-based programs and admissions policies may not effectively address the underlying issues of racial disparities and may even perpetuate academic underperformance. These programs may benefit the well-off members of minority races more than those who are truly struggling. The report highlights the persistence of intergenerational race-based gaps in health, wealth, and well-being. It also mentions that admissions have increased for all racial minorities, including Asian Americans. However, the report points out that Harvard's race-conscious admissions policy results in fewer Asian Americans being admitted.


In [44]:
query = "Who were justice on the bench in this case?"
answer = chain.run(query)
print(answer)

The justices on the bench in this case were Justice Sotomayor, Justice Kagan, Justice Jackson, Justice Gorsuch, Justice Breyer, Justice Kavanaugh, and Chief Justice Roberts.


In [45]:
query = "Who voted for and who voted against?"
answer = chain.run(query)
print(answer)

The provided context does not explicitly state who voted for or against specific measures or amendments. Therefore, I don't have information about specific voting records for these historical events.


In [46]:
query = "Who wrote the majority opinion?"
answer = chain.run(query)
print(answer)

The given context does not mention who wrote the majority opinion.


In [47]:
query = "What was justice Sotomayor's dissenting opinion?"
answer = chain.run(query)
print(answer)

Justice Sotomayor's dissenting opinion argues against the Court's decision to limit the use of race-based affirmative action in higher education. She emphasizes the importance of racial equality and diversity, stating that progress toward equality cannot be permanently halted. She criticizes the Court for disregarding the ongoing racial inequality in society and argues that diversity in education is a fundamental value. She also addresses the consequences of a lack of diversity in leadership positions and the importance of representation. She disagrees with the majority's interpretation of previous precedents and states that race-conscious admissions can be justified to promote the educational benefits of diversity. Overall, Justice Sotomayor's dissenting opinion supports the continued use of race-based affirmative action in higher education.


In [48]:
query = "Can you list quotable quotes in the ruling?"
answer = chain.run(query)
print(answer)

Here are some quotable quotes from the ruling:

1. "Our Constitution is color-blind, and neither knows nor tolerates classes among citizens." - Justice Harlan (dissenting in Plessy)
2. "Both programs lack sufficiently focused and measurable objectives warranting the use of race, unavoidably employ race in a negative manner, involve racial stereotyping, and lack meaningful endpoints." - Opinion of the Court
3. "A system of government that visibly lacks a path to leadership open to every race cannot withstand scrutiny 'in the eyes of the citizenry.'" - Justice Sotomayor (dissenting)
4. "Under a faithful application of the Court’s settled legal framework, Harvard and UNC’s admissions programs are constitutional and comply with Title VI of the Civil Rights Act of 1964." - Justice Sotomayor (dissenting)
5. "The Equal Protection Clause of the Fourteenth Amendment enshrines a guarantee of racial equality. The Court long ago concluded that this guarantee can


In [49]:
query = "What is proposed admission process in the supreme court ruling?"
answer = chain.run(query)
print(answer)

The Supreme Court ruling does not propose a specific admission process. The ruling is focused on whether the admissions systems used by Harvard College and the University of North Carolina are lawful under the Equal Protection Clause of the Fourteenth Amendment. The Court concludes that the use of race in the admissions processes of these universities is not permissible. The ruling does not provide an alternative or proposed admission process.
