# Supreme Court - Intellectual Property - Topic Model

## I. Data Arrangement and Cleaning

### Reading in and Merging the Original Excel Sheets

In [1]:
#Importing in pandas dataframes package, numpy package, and glob package for subsequent transformations and analyses
import pandas as pd
import numpy as np
import glob

#Locating the excel file on my computer, and defining as a python/pandas object we can access and manipulate
file = "../Supreme_Court_Project/Supreme Court Cases Corpus_v1.xlsx"

x1 = pd.ExcelFile(file)

#Printing the sheet names so I know the number and name of the tables I need to access
print(x1.sheet_names)

#creating a dataframe for each sheet, essentially spreadsheets that python can manipulate, transform, and analyze
df1 = x1.parse("Trademark")
df2 = x1.parse("Copyright")
df3 = x1.parse("Patent")

['Trademark', 'Copyright', 'Patent']


In [2]:
#Look at the trademark dataframe to make sure it was structured correctly, we can see it wasn't, the column headers 
#aren't right, they have been bumped down as entry values in the second row, we have to reindex
df1

Unnamed: 0,https://en.wikipedia.org/wiki/List_of_United_States_Supreme_Court_trademark_case_law,Unnamed: 1,Unnamed: 2,Unnamed: 3,Unnamed: 4,Unnamed: 5,Unnamed: 6,Unnamed: 7
0,Case name,IP,Date,Author,Joined By,Part of opinion,Suffix,Text file name
1,American Needle Inc. v. NFL,Trademark,2010,Stevens,9,Main,,americanneedle
2,"KP Permanent Makeup, Inc. v. Lasting Impressio...",Trademark,2004,Souter,9,Main,,kppermanentmakeup
3,"Moseley v. V Secret Catalogue, Inc.",Trademark,2003,Stevens,9,Main,,moseley
4,"Moseley v. V Secret Catalogue, Inc.",Trademark,2003,Kennedy,1,Concurrence,c1,moseleyc
5,Dastar Corp. v. Twentieth Century Fox Film Corp.,Trademark,2003,Scalia,8,Main,,dastar
6,"TrafFix Devices, Inc. v. Marketing Displays, Inc.",Trademark,2001,Kennedy,9,Main,,traffixdevices
7,Cooper Industries v. Leatherman Tool Group,Trademark,2001,Stevens,7,Main,,cooperindustries
8,Cooper Industries v. Leatherman Tool Group,Trademark,2001,Thomas,1,Concurrence,c1,cooperindustriesc1
9,Cooper Industries v. Leatherman Tool Group,Trademark,2001,Scalia,1,Concurrence,c2,cooperindustriesc2


In [3]:
#here, we are re-indexing the dataframe so that the original column category headers are organizing the dataframe
new_header = df1.iloc[0] #grab the first row for the header
df1.columns = new_header #set the header row as the df header

#Here we resave the dataframe to include our changes
df1 = df1[1:] #take the data less the the original header row, getting rid of the "unnamed" headings
df1

Unnamed: 0,Case name,IP,Date,Author,Joined By,Part of opinion,Suffix,Text file name
1,American Needle Inc. v. NFL,Trademark,2010,Stevens,9,Main,,americanneedle
2,"KP Permanent Makeup, Inc. v. Lasting Impressio...",Trademark,2004,Souter,9,Main,,kppermanentmakeup
3,"Moseley v. V Secret Catalogue, Inc.",Trademark,2003,Stevens,9,Main,,moseley
4,"Moseley v. V Secret Catalogue, Inc.",Trademark,2003,Kennedy,1,Concurrence,c1,moseleyc
5,Dastar Corp. v. Twentieth Century Fox Film Corp.,Trademark,2003,Scalia,8,Main,,dastar
6,"TrafFix Devices, Inc. v. Marketing Displays, Inc.",Trademark,2001,Kennedy,9,Main,,traffixdevices
7,Cooper Industries v. Leatherman Tool Group,Trademark,2001,Stevens,7,Main,,cooperindustries
8,Cooper Industries v. Leatherman Tool Group,Trademark,2001,Thomas,1,Concurrence,c1,cooperindustriesc1
9,Cooper Industries v. Leatherman Tool Group,Trademark,2001,Scalia,1,Concurrence,c2,cooperindustriesc2
10,Cooper Industries v. Leatherman Tool Group,Trademark,2001,Ginsburg,1,Dissent,d1,cooperindustriesd1


In [4]:
#Here we check if the copyright dataframe is structured correctly, it is
df2

Unnamed: 0,Case name,IP,Date,Author,Joined By,Part of opinion,Suffix,Text file name
0,"Star Athletica, LLC v. Varsity Brands, Inc.",Copyright,2017,Thomas,6,Main,,starathletica
1,"Star Athletica, LLC v. Varsity Brands, Inc.",Copyright,2017,Ginsburg,1,Concurrence,c1,starathleticac1
2,"Star Athletica, LLC v. Varsity Brands, Inc.",Copyright,2017,Breyer,2,Dissent,d1,starathleticad1
3,"American Broadcasting Cos., Inc. v. Aereo, Inc.",Copyright,2014,Breyer,6,Main,,aereo
4,"American Broadcasting Cos., Inc. v. Aereo, Inc.",Copyright,2014,Scalia,3,Dissent,d1,aereod1
5,"Petrella v. Metro-Goldwyn-Mayer, Inc.",Copyright,2014,Ginsburg,6,Main,,petrella
6,"Petrella v. Metro-Goldwyn-Mayer, Inc.",Copyright,2014,Breyer,3,Dissent,d1,petrellad1
7,"Kirtsaeng v. John Wiley & Sons, Inc.",Copyright,2013,Breyer,6,Main,,kirtsaeng
8,"Kirtsaeng v. John Wiley & Sons, Inc.",Copyright,2013,Kagan,2,Concurrence,c1,kirtsaengc1
9,"Kirtsaeng v. John Wiley & Sons, Inc.",Copyright,2013,Ginsburg,3,Dissent,d1,kirtsaengd1


In [5]:
#Checking the patent dataframe, it looks good
df3

Unnamed: 0,Case name,IP,Date,Author,Joined By,Part of opinion,Suffix,Text file name
0,Samsung Electronics Co. v. Apple Inc.,Patent,2016,Sotomayor,8,Main,,samsung
1,Halo Electronics v. Pulse Electronics Inc.,Patent,2016,Roberts,9,Main,,haloelectronics
2,Halo Electronics v. Pulse Electronics Inc.,Patent,2016,Breyer,3,Concurrence,c1,haloelectronicsc1
3,"Kimble v. Marvel Entertainment, LLC",Patent,2015,Kagan,6,Main,,kimble
4,"Kimble v. Marvel Entertainment, LLC",Patent,2015,Alito,3,Dissent,d1,kimbled1
5,Commil v. Cisco,Patent,2015,Kennedy,6,Main,,commilusa
6,Commil v. Cisco,Patent,2015,Scalia,2,Dissent,d1,commilusad1
7,"Teva Pharmaceuticals USA, Inc. v. Sandoz, Inc.",Patent,2015,Breyer,7,Main,,tevapharmaceuticals
8,"Teva Pharmaceuticals USA, Inc. v. Sandoz, Inc.",Patent,2015,Thomas,2,Dissent,d1,tevapharmaceuticalsd1
9,"Nautilus, Inc. v. Biosig Instruments, Inc.",Patent,2014,Ginsburg,9,Main,,nautilus


In [6]:
#We now have to merge these sheets into one dataframe, since the original excel file split them into three sheets

#Here, I am merely expanding our view of the dataframe so we can see all the entries, as opposed to the 
#tenuated versions above
pd.set_option('display.max_rows', 160)

#I make a list of the three dataframes from above
frames = [df1, df2, df3]

#I make a new dataframe by concatenating or combining the three dataframes in the list I just created
df_cases = pd.concat(frames)

#I reset the index so that the row numbers reset in numerical order
df_cases = df_cases.reset_index(drop = True)

#I show the dataframe to check it worked correctly, it did
df_cases

Unnamed: 0,Case name,IP,Date,Author,Joined By,Part of opinion,Suffix,Text file name
0,American Needle Inc. v. NFL,Trademark,2010,Stevens,9,Main,,americanneedle
1,"KP Permanent Makeup, Inc. v. Lasting Impressio...",Trademark,2004,Souter,9,Main,,kppermanentmakeup
2,"Moseley v. V Secret Catalogue, Inc.",Trademark,2003,Stevens,9,Main,,moseley
3,"Moseley v. V Secret Catalogue, Inc.",Trademark,2003,Kennedy,1,Concurrence,c1,moseleyc
4,Dastar Corp. v. Twentieth Century Fox Film Corp.,Trademark,2003,Scalia,8,Main,,dastar
5,"TrafFix Devices, Inc. v. Marketing Displays, Inc.",Trademark,2001,Kennedy,9,Main,,traffixdevices
6,Cooper Industries v. Leatherman Tool Group,Trademark,2001,Stevens,7,Main,,cooperindustries
7,Cooper Industries v. Leatherman Tool Group,Trademark,2001,Thomas,1,Concurrence,c1,cooperindustriesc1
8,Cooper Industries v. Leatherman Tool Group,Trademark,2001,Scalia,1,Concurrence,c2,cooperindustriesc2
9,Cooper Industries v. Leatherman Tool Group,Trademark,2001,Ginsburg,1,Dissent,d1,cooperindustriesd1


In [7]:
#I rename the Text file name column to filename as it is easier to work with in the future, variables and headings 
#in general should be lowercase and have no spaces, they can have underscores and dashes 
df_cases.rename(columns={"Text file name":"filename"}, inplace=True)

#I sort the dataframe first by IP type and then alphabetically for comparison against the original excel file, making
#sure we have all the case entries, we do
df_cases.sort_values(by=["IP","filename"])

Unnamed: 0,Case name,IP,Date,Author,Joined By,Part of opinion,Suffix,filename
33,"American Broadcasting Cos., Inc. v. Aereo, Inc.",Copyright,2014,Breyer,6,Main,,aereo
34,"American Broadcasting Cos., Inc. v. Aereo, Inc.",Copyright,2014,Scalia,3,Dissent,d1,aereod1
58,"Campbell v. Acuff-Rose Music, Inc.",Copyright,1994,Souter,9,Main,,campbell
59,"Campbell v. Acuff-Rose Music, Inc.",Copyright,1994,Kennedy,1,Concurrence,c1,campbellc1
66,Community for Creative Non-Violence v. Reid,Copyright,1989,Marshall,9,Main,,ccnv
48,Eldred v. Ashcroft,Copyright,2003,Ginsburg,8,Main,,eldred
49,Eldred v. Ashcroft,Copyright,2003,Stevens,1,Dissent,d1,eldredd1
50,Eldred v. Ashcroft,Copyright,2003,Breyer,1,Dissent,d2,eldredd2
62,"Feist Publications, Inc. v. Rural Telephone Se...",Copyright,1991,O'Connor,9,Main,,feist
53,"Feltner v. Columbia Pictures Television, Inc.",Copyright,1998,Thomas,9,Main,,feltner


### Reading in the Text Data

In [8]:
#I use the glob package to create a list of all the text files in the Supreme Court Corpus .txt files folder
#This list can be used to run operations on all these files at once, for our purposes, we need it to read them in
filelist = glob.glob("../Supreme_Court_Project/Supreme Court Corpus .txt files/*.txt")

In [9]:
filelist

['../Supreme_Court_Project/Supreme Court Corpus .txt files/aereo.txt',
 '../Supreme_Court_Project/Supreme Court Corpus .txt files/aereod1.txt',
 '../Supreme_Court_Project/Supreme Court Corpus .txt files/alice.txt',
 '../Supreme_Court_Project/Supreme Court Corpus .txt files/alicec1.txt',
 '../Supreme_Court_Project/Supreme Court Corpus .txt files/americanneedle.txt',
 '../Supreme_Court_Project/Supreme Court Corpus .txt files/asgrowseed.txt',
 '../Supreme_Court_Project/Supreme Court Corpus .txt files/asgrowseedd1.txt',
 '../Supreme_Court_Project/Supreme Court Corpus .txt files/associationformolecularpathology.txt',
 '../Supreme_Court_Project/Supreme Court Corpus .txt files/associationformolecularpathologyc1.txt',
 '../Supreme_Court_Project/Supreme Court Corpus .txt files/bilski.txt',
 '../Supreme_Court_Project/Supreme Court Corpus .txt files/bilskic1.txt',
 '../Supreme_Court_Project/Supreme Court Corpus .txt files/bilskic2.txt',
 '../Supreme_Court_Project/Supreme Court Corpus .txt files/b

In [10]:
#So now we want to read in the text from our files, and attach them to the file names

#We create an empty list that will contain our file names
filenamelist = []

#We create an empty list that will contain our text
textlist = []

#We create a for loop which fills these lists, it was the textlist part that created all the intial trouble with
#encoding issues, it turns out it was "mac-roman" but we didn't know that from the outset which led to me trying 
#a variety of different encodings  
for file in filelist:
    filenamelist.append(file)
    textlist.append(open(file, encoding="mac-roman").read())

#We print the first ten file names, they are still in their long directory format
print(filenamelist[:10])

['../Supreme_Court_Project/Supreme Court Corpus .txt files/aereo.txt', '../Supreme_Court_Project/Supreme Court Corpus .txt files/aereod1.txt', '../Supreme_Court_Project/Supreme Court Corpus .txt files/alice.txt', '../Supreme_Court_Project/Supreme Court Corpus .txt files/alicec1.txt', '../Supreme_Court_Project/Supreme Court Corpus .txt files/americanneedle.txt', '../Supreme_Court_Project/Supreme Court Corpus .txt files/asgrowseed.txt', '../Supreme_Court_Project/Supreme Court Corpus .txt files/asgrowseedd1.txt', '../Supreme_Court_Project/Supreme Court Corpus .txt files/associationformolecularpathology.txt', '../Supreme_Court_Project/Supreme Court Corpus .txt files/associationformolecularpathologyc1.txt', '../Supreme_Court_Project/Supreme Court Corpus .txt files/bilski.txt']


In [11]:
#We want to get rid of the long direcotry file names, and leave just the file name so we can merge this with the 
#excel file from before. We get rid of the long beginning.
filenamelistnew = [e.split("/")[-1] for e in filenamelist]
print(filenamelistnew)

['aereo.txt', 'aereod1.txt', 'alice.txt', 'alicec1.txt', 'americanneedle.txt', 'asgrowseed.txt', 'asgrowseedd1.txt', 'associationformolecularpathology.txt', 'associationformolecularpathologyc1.txt', 'bilski.txt', 'bilskic1.txt', 'bilskic2.txt', 'bonitoboats.txt', 'bowman.txt', 'campbell.txt', 'campbellc1.txt', 'caraco.txt', 'caracoc1.txt', 'cardinalchemical.txt', 'cardinalchemicalc1.txt', 'ccnv.txt', 'chakrabarty.txt', 'chakrabartyd1.txt', 'christianson.txt', 'christiansonc1.txt', 'collegesavingsbank.txt', 'collegesavingsbankd1.txt', 'collegesavingsbankd2.txt', 'commilusa.txt', 'commilusad1.txt', 'cooperindustries.txt', 'cooperindustriesc1.txt', 'cooperindustriesc2.txt', 'cooperindustriesd1.txt', 'dastar.txt', 'dawsonchemical.txt', 'dawsonchemicald1.txt', 'dawsonchemicald2.txt', 'dennison.txt', 'diamond.txt', 'diamondd1.txt', 'dickinson.txt', 'dickinsond1.txt', 'ebay.txt', 'ebayc1.txt', 'ebayc2.txt', 'eldred.txt', 'eldredd1.txt', 'eldredd2.txt', 'elililly.txt', 'elilillyd1.txt', 'feist

In [12]:
#We also have to get rid of the .txt after the filenames because the excel document has filenames without the .txt
filename_list= [e.split(".") [0] for e in filenamelistnew]
print(filename_list)

['aereo', 'aereod1', 'alice', 'alicec1', 'americanneedle', 'asgrowseed', 'asgrowseedd1', 'associationformolecularpathology', 'associationformolecularpathologyc1', 'bilski', 'bilskic1', 'bilskic2', 'bonitoboats', 'bowman', 'campbell', 'campbellc1', 'caraco', 'caracoc1', 'cardinalchemical', 'cardinalchemicalc1', 'ccnv', 'chakrabarty', 'chakrabartyd1', 'christianson', 'christiansonc1', 'collegesavingsbank', 'collegesavingsbankd1', 'collegesavingsbankd2', 'commilusa', 'commilusad1', 'cooperindustries', 'cooperindustriesc1', 'cooperindustriesc2', 'cooperindustriesd1', 'dastar', 'dawsonchemical', 'dawsonchemicald1', 'dawsonchemicald2', 'dennison', 'diamond', 'diamondd1', 'dickinson', 'dickinsond1', 'ebay', 'ebayc1', 'ebayc2', 'eldred', 'eldredd1', 'eldredd2', 'elililly', 'elilillyd1', 'feist', 'feltner', 'feltnerc1', 'festo', 'floridaprepaid', 'floridaprepaidd1', 'fogerty', 'fogertyc1', 'ftc', 'ftcd1', 'generalmotors', 'generalmotorsc1', 'globaltechappliances', 'globaltechappliancesd1', 'gro

In [13]:
#We print the text for the first ten files
print(textlist[:10])



In [14]:
#Here, we turn the filenamelist and textlist into a dataframe with corresponding entries

#We create an empty dataframe with the columns filename and text
df_text = pd.DataFrame(columns=["filename","text"])

#We enter our filename_list into the filename column, and the textlist into the text column, then look at our dataframe
df_text["filename"]= filename_list
df_text["text"] = textlist
df_text

Unnamed: 0,filename,text
0,aereo,The Copyright Act of 1976 gives a copyright ow...
1,aereod1,This case is the latest skirmish in the long-r...
2,alice,The patents at issue in this case disclose a c...
3,alicec1,I adhere to the view that any “claim that mere...
4,americanneedle,"“Every contract, combination in the form of a ..."
5,asgrowseed,"The Plant Variety Protection Act of 1970, 7 U...."
6,asgrowseedd1,The key to this statutory puzzle is the meanin...
7,associationformolecularpathology,"Respondent Myriad Genetics, Inc. (Myriad), dis..."
8,associationformolecularpathologyc1,"I join the judgment of the Court, and all of i..."
9,bilski,The question in this case turns on whether a p...


### Merging the original excel dataframe with the text dataframe

In [15]:
#We merge the excel dataframe from above with the text dataframe, specifying we will merge entries on the filename
#column in eahc dataframe, that way the entries go where they belong
df_final = pd.merge(df_cases, df_text, on = 'filename')
df_final

Unnamed: 0,Case name,IP,Date,Author,Joined By,Part of opinion,Suffix,filename,text
0,American Needle Inc. v. NFL,Trademark,2010,Stevens,9,Main,,americanneedle,"“Every contract, combination in the form of a ..."
1,"KP Permanent Makeup, Inc. v. Lasting Impressio...",Trademark,2004,Souter,9,Main,,kppermanentmakeup,* Justice SCALIA joins all but footnotes 4 and...
2,"Moseley v. V Secret Catalogue, Inc.",Trademark,2003,Stevens,9,Main,,moseley,* Justice SCALIA joins all but Part III of thi...
3,Dastar Corp. v. Twentieth Century Fox Film Corp.,Trademark,2003,Scalia,8,Main,,dastar,"In this case, we are asked to decide whether §..."
4,"TrafFix Devices, Inc. v. Marketing Displays, Inc.",Trademark,2001,Kennedy,9,Main,,traffixdevices,Temporary road signs with warnings like “Road ...
5,Cooper Industries v. Leatherman Tool Group,Trademark,2001,Stevens,7,Main,,cooperindustries,A jury found petitioner guilty of unfair compe...
6,Cooper Industries v. Leatherman Tool Group,Trademark,2001,Thomas,1,Concurrence,c1,cooperindustriesc1,I continue to believe that the Constitution do...
7,Cooper Industries v. Leatherman Tool Group,Trademark,2001,Scalia,1,Concurrence,c2,cooperindustriesc2,I was (and remain) of the view that excessive ...
8,Cooper Industries v. Leatherman Tool Group,Trademark,2001,Ginsburg,1,Dissent,d1,cooperindustriesd1,"In Gasperini v. Center for Humanities, Inc., 5..."
9,"Wal-Mart Stores Inc. v. Samara Brothers, Inc.",Trademark,2000,Scalia,9,Main,,walmartstores,"In this case, we decide under what circumstanc..."


In [16]:
#I sort the final dataframe by IP than by filename to check against our original excel spreadhseet to make sure we have
#all the entries. We are missing five, the three that we don't have the text files for, and two that are mislabelled
df_final.sort_values(by=["IP","filename"])

Unnamed: 0,Case name,IP,Date,Author,Joined By,Part of opinion,Suffix,filename,text
32,"American Broadcasting Cos., Inc. v. Aereo, Inc.",Copyright,2014,Breyer,6,Main,,aereo,The Copyright Act of 1976 gives a copyright ow...
33,"American Broadcasting Cos., Inc. v. Aereo, Inc.",Copyright,2014,Scalia,3,Dissent,d1,aereod1,This case is the latest skirmish in the long-r...
55,"Campbell v. Acuff-Rose Music, Inc.",Copyright,1994,Souter,9,Main,,campbell,We are called upon to decide whether 2 Live Cr...
56,"Campbell v. Acuff-Rose Music, Inc.",Copyright,1994,Kennedy,1,Concurrence,c1,campbellc1,I agree that remand is appropriate and join th...
63,Community for Creative Non-Violence v. Reid,Copyright,1989,Marshall,9,Main,,ccnv,"In this case, an artist and the organization t..."
46,Eldred v. Ashcroft,Copyright,2003,Ginsburg,8,Main,,eldred,This case concerns the authority the Constitut...
47,Eldred v. Ashcroft,Copyright,2003,Stevens,1,Dissent,d1,eldredd1,"Writing for a unanimous Court in 1964, Justice..."
48,Eldred v. Ashcroft,Copyright,2003,Breyer,1,Dissent,d2,eldredd2,The Constitution’s Copyright Clause grants Con...
59,"Feist Publications, Inc. v. Rural Telephone Se...",Copyright,1991,O'Connor,9,Main,,feist,This case requires us to clarify the extent of...
51,"Feltner v. Columbia Pictures Television, Inc.",Copyright,1998,Thomas,9,Main,,feltner,Section 504(c) of the Copyright Act of 1976 pe...


## II. Topic Modelling

So even though we don't have all the files we need to run our final topic model, we can make those changes pretty easily in the code above to include them in our dataframe when we get them. 

I wanted to take what we have, all our data minus the five files, and show you what our topic model could look like, and the output we'll get. It will change a bit once we have all the files  ut this should be a good proxy for looking at what we can get out of this data.

I will run a topic model that will spit out 6 topics. We can and will change that number in the future, I'll run mutiple models with different numbers of topics so you can investigate at what point they become interpretable and meaningful.

### 1. Fit a Topic Model, using LDA
Now we're ready to fit the model. This requires the use of CountVecorizer and the scikit-learn function LatentDirichletAllocation.


In [17]:
#Here we have our imported functions for Topic Modeling and LDA that we will use in our analysis
#I ran the topic model for 6 topics  


####Adopted From: 
#Author: Olivier Grisel <olivier.grisel@ensta.org>
#         Lars Buitinck
#         Chyi-Kwei Yau <chyikwei.yau@gmail.com>
# License: BSD 3 clause

from sklearn.feature_extraction.text import CountVectorizer
from sklearn.decomposition import LatentDirichletAllocation

n_samples = 2000
n_topics = 6
n_top_words = 50

##This is a function to print out the top words for each topic in a pretty way.
#Don't worry too much about understanding every line of this code.
def print_top_words(model, feature_names, n_top_words):
    for topic_idx, topic in enumerate(model.components_):
        print("\nTopic #%d:" % topic_idx)
        print(" ".join([feature_names[i]
                        for i in topic.argsort()[:-n_top_words - 1:-1]]))
    print()

In [18]:
# Vectorize our text using CountVectorizer
print("Extracting tf features for LDA...")
tf_vectorizer = CountVectorizer(max_df=0.80, min_df=50,
                                max_features=None,
                                stop_words='english'
                                )

tf = tf_vectorizer.fit_transform(df_final.text)

Extracting tf features for LDA...


In [19]:
print("Fitting LDA models with tf features, "
      "n_samples=%d and n_topics=%d..."
      % (n_samples, n_topics))

#define the lda function, with desired options
#Check the documentation, linked above, to look through the options
lda = LatentDirichletAllocation(n_topics=n_topics, max_iter=20,
                                learning_method='online',
                                learning_offset=80.,
                                total_samples=n_samples,
                                random_state=0)
#fit the model
lda.fit(tf)

Fitting LDA models with tf features, n_samples=2000 and n_topics=6...




LatentDirichletAllocation(batch_size=128, doc_topic_prior=None,
             evaluate_every=-1, learning_decay=0.7,
             learning_method='online', learning_offset=80.0,
             max_doc_update_iter=100, max_iter=20, mean_change_tol=0.001,
             n_components=10, n_jobs=1, n_topics=6, perp_tol=0.1,
             random_state=0, topic_word_prior=None, total_samples=2000,
             verbose=0)

In [20]:
#print the top words per topic, using the function defined above.
#Here are the top 50 words for the six topics we specified, this is the output we will consult to try to
#interperate for meaning. In the future, with all the files, we will run multiple models with different numbers of
#topics and choose the one that is most useful to you, then proceed with further analyses like which topic is present
#most in which type of IP, etc

print("\nTopics in LDA model:")
tf_feature_names = tf_vectorizer.get_feature_names()
print_top_words(lda, tf_feature_names, n_top_words)


Topics in LDA model:

Topic #0:
patent federal infringement claim id state invention claims process circuit congress patents doctrine cases district judgment appeals question issue courts did supra decision subject practice amendment application petitioner action rule prior held petitioners matter art time corp suit laws patented respondent damages fed use parties right decisions construction power new

Topic #1:
trade 43 meaning product marks 15 trademark protection appeals petitioner cause circuit competition supra courts person word particular merely 102 action district used did business common likely id second test protected finding issue new line words general held provides 11 based judgment corp 112 use principles including 1988 commerce establish

Topic #2:
copyright work works use congress rights right original 1976 public statute owner 17 id term new time ante statutory title grant supra protection years petitioners brief cong 109 language infringement appeals exclusive secti