## Machine Learning

This notebook summarizes the performance of various different machine learning models when applied to the dataset.

The first step I will take is to import some basic python packages and load the dataset into the `df` variable.

In [1]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
pd.options.display.max_columns = 500

%matplotlib inline

In [2]:
df = pd.read_csv('df_model_5.csv', index_col=0)

In [5]:
df.head()

Unnamed: 0,casename,date,id,judge,opinion,type,judge_count,year,word_count,sentence_count,avg_sent_length,polarity,subjectivity
0,ACT-UP Triangle v. Commission for Health Services,1997-04-11,53839,frye,"FRYE, Justice.\nThis case involves the adoptio...",majority,415,1997,4364,214,20.392523,0.102371,0.473721
1,Mahoney v. Ronnie’s Road Service,1997-03-07,53841,per_curiam,PER CURIAM.\nAFFIRMED.\nJustice PARKER did not...,majority,6440,1997,16,3,5.333333,0.0,0.0
2,State v. Westbrooks,1996-12-06,53843,parker_sarah,"PARKER, Justice.\nDefendant, Donna Sue Westbro...",majority,456,1996,9467,405,23.375309,-0.012108,0.321622
3,State v. Conner,1997-02-10,53847,whichard,"WHICHARD, Justice.\nOn 13 November 1990, defen...",majority,661,1997,6111,268,22.802239,-0.024927,0.415121
4,Fulton Corp. v. Faulkner,1997-02-10,53848,webb,"WEBB, Justice.\nThis case brings to the Court ...",majority,801,1997,1346,75,17.946667,0.051909,0.415155


In [21]:
df.info()

<class 'pandas.core.frame.DataFrame'>
Int64Index: 80749 entries, 0 to 80748
Data columns (total 13 columns):
casename           80749 non-null object
date               80748 non-null object
id                 80749 non-null int64
judge              80749 non-null object
opinion            80749 non-null object
type               80749 non-null object
judge_count        80749 non-null int64
year               80749 non-null int64
word_count         80749 non-null int64
sentence_count     80749 non-null int64
avg_sent_length    80749 non-null float64
polarity           80749 non-null float64
subjectivity       80749 non-null float64
dtypes: float64(3), int64(5), object(5)
memory usage: 8.6+ MB


## Preparing the Features for Modeling

I will start by testing how some classifiers perform using only the characteristic features of the opinions that I used when exploring the data without the actual text of the opinions.  In order to do this, I need to convert those features into an appropriate format for the machine learning models.

### Creating the Machine Learning Dataframe

I will begin by creating a separate dataframe, `df_ml`, where I will store the various features that we will use in training the models.  I will start by adding the id number for each opinion for identification purposes, as well as the judge, which will be our label that the model will attempt to predict.

In [6]:
df_ml = df.loc[:, ['id', 'judge']]
df_ml.head()

Unnamed: 0,id,judge
0,53839,frye
1,53841,per_curiam
2,53843,parker_sarah
3,53847,whichard
4,53848,webb


### Converting the Categorical Features into Dummy Variables

The dataset has two categorical features, the type of opinion (majority, concurrence, etc.) and the year the opinion was issued.  I will convert both of these features into dummy columns using `pd.get_dummies` and then add the dummies to the `df_ml` dataframe:

In [7]:
# create dummy columns for 'type' feature
df_dummies = pd.get_dummies(df['type'], prefix='type')
df_dummies.head()

Unnamed: 0,type_concurrence,type_concurring-in-part-and-dissenting-in-part,type_dissent,type_majority,type_rehearing
0,0,0,0,1,0
1,0,0,0,1,0
2,0,0,0,1,0
3,0,0,0,1,0
4,0,0,0,1,0


In [9]:
# add the dummy columns to the df_ml dataframe
df_ml = pd.concat([df_ml, df_dummies], axis=1)
df_ml.head()

Unnamed: 0,id,judge,type_concurrence,type_concurring-in-part-and-dissenting-in-part,type_dissent,type_majority,type_rehearing
0,53839,frye,0,0,0,1,0
1,53841,per_curiam,0,0,0,1,0
2,53843,parker_sarah,0,0,0,1,0
3,53847,whichard,0,0,0,1,0
4,53848,webb,0,0,0,1,0


In [10]:
# create dummy columns for 'year' feature
df_dummies = pd.get_dummies(df['year'], prefix='year')
df_dummies.head()

Unnamed: 0,year_1779,year_1784,year_1787,year_1789,year_1790,year_1791,year_1792,year_1793,year_1794,year_1795,year_1796,year_1797,year_1798,year_1799,year_1800,year_1801,year_1802,year_1803,year_1804,year_1805,year_1806,year_1807,year_1808,year_1809,year_1810,year_1811,year_1812,year_1813,year_1814,year_1815,year_1816,year_1817,year_1818,year_1819,year_1820,year_1821,year_1822,year_1823,year_1824,year_1825,year_1826,year_1827,year_1828,year_1829,year_1830,year_1831,year_1832,year_1833,year_1834,year_1835,year_1836,year_1837,year_1838,year_1839,year_1840,year_1841,year_1842,year_1843,year_1844,year_1845,year_1846,year_1847,year_1848,year_1849,year_1850,year_1851,year_1852,year_1853,year_1854,year_1855,year_1856,year_1857,year_1858,year_1859,year_1860,year_1861,year_1862,year_1863,year_1864,year_1866,year_1867,year_1868,year_1869,year_1870,year_1871,year_1872,year_1873,year_1874,year_1875,year_1876,year_1877,year_1878,year_1879,year_1880,year_1881,year_1882,year_1883,year_1884,year_1885,year_1886,year_1887,year_1888,year_1889,year_1890,year_1891,year_1892,year_1893,year_1894,year_1895,year_1896,year_1897,year_1898,year_1899,year_1900,year_1901,year_1902,year_1903,year_1904,year_1905,year_1906,year_1907,year_1908,year_1909,year_1910,year_1911,year_1912,year_1913,year_1914,year_1915,year_1916,year_1917,year_1918,year_1919,year_1920,year_1921,year_1922,year_1923,year_1924,year_1925,year_1926,year_1927,year_1928,year_1929,year_1930,year_1931,year_1932,year_1933,year_1934,year_1935,year_1936,year_1937,year_1938,year_1939,year_1940,year_1941,year_1942,year_1943,year_1944,year_1945,year_1946,year_1947,year_1948,year_1949,year_1950,year_1951,year_1952,year_1953,year_1954,year_1955,year_1956,year_1957,year_1958,year_1959,year_1960,year_1961,year_1962,year_1963,year_1964,year_1965,year_1966,year_1967,year_1968,year_1969,year_1970,year_1971,year_1972,year_1973,year_1974,year_1975,year_1976,year_1977,year_1978,year_1979,year_1980,year_1981,year_1982,year_1983,year_1984,year_1985,year_1986,year_1987,year_1988,year_1989,year_1990,year_1991,year_1992,year_1993,year_1994,year_1995,year_1996,year_1997,year_1998,year_1999,year_2000,year_2001,year_2002,year_2003,year_2004,year_2005,year_2006,year_2007,year_2008,year_2009,year_2010,year_2011,year_2012,year_2013,year_2014,year_2015,year_2016,year_2017
0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
2,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
3,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
4,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0


In [11]:
# add the dummy columns to the df_ml dataframe
df_ml = pd.concat([df_ml, df_dummies], axis=1)
df_ml.head()

Unnamed: 0,id,judge,type_concurrence,type_concurring-in-part-and-dissenting-in-part,type_dissent,type_majority,type_rehearing,year_1779,year_1784,year_1787,year_1789,year_1790,year_1791,year_1792,year_1793,year_1794,year_1795,year_1796,year_1797,year_1798,year_1799,year_1800,year_1801,year_1802,year_1803,year_1804,year_1805,year_1806,year_1807,year_1808,year_1809,year_1810,year_1811,year_1812,year_1813,year_1814,year_1815,year_1816,year_1817,year_1818,year_1819,year_1820,year_1821,year_1822,year_1823,year_1824,year_1825,year_1826,year_1827,year_1828,year_1829,year_1830,year_1831,year_1832,year_1833,year_1834,year_1835,year_1836,year_1837,year_1838,year_1839,year_1840,year_1841,year_1842,year_1843,year_1844,year_1845,year_1846,year_1847,year_1848,year_1849,year_1850,year_1851,year_1852,year_1853,year_1854,year_1855,year_1856,year_1857,year_1858,year_1859,year_1860,year_1861,year_1862,year_1863,year_1864,year_1866,year_1867,year_1868,year_1869,year_1870,year_1871,year_1872,year_1873,year_1874,year_1875,year_1876,year_1877,year_1878,year_1879,year_1880,year_1881,year_1882,year_1883,year_1884,year_1885,year_1886,year_1887,year_1888,year_1889,year_1890,year_1891,year_1892,year_1893,year_1894,year_1895,year_1896,year_1897,year_1898,year_1899,year_1900,year_1901,year_1902,year_1903,year_1904,year_1905,year_1906,year_1907,year_1908,year_1909,year_1910,year_1911,year_1912,year_1913,year_1914,year_1915,year_1916,year_1917,year_1918,year_1919,year_1920,year_1921,year_1922,year_1923,year_1924,year_1925,year_1926,year_1927,year_1928,year_1929,year_1930,year_1931,year_1932,year_1933,year_1934,year_1935,year_1936,year_1937,year_1938,year_1939,year_1940,year_1941,year_1942,year_1943,year_1944,year_1945,year_1946,year_1947,year_1948,year_1949,year_1950,year_1951,year_1952,year_1953,year_1954,year_1955,year_1956,year_1957,year_1958,year_1959,year_1960,year_1961,year_1962,year_1963,year_1964,year_1965,year_1966,year_1967,year_1968,year_1969,year_1970,year_1971,year_1972,year_1973,year_1974,year_1975,year_1976,year_1977,year_1978,year_1979,year_1980,year_1981,year_1982,year_1983,year_1984,year_1985,year_1986,year_1987,year_1988,year_1989,year_1990,year_1991,year_1992,year_1993,year_1994,year_1995,year_1996,year_1997,year_1998,year_1999,year_2000,year_2001,year_2002,year_2003,year_2004,year_2005,year_2006,year_2007,year_2008,year_2009,year_2010,year_2011,year_2012,year_2013,year_2014,year_2015,year_2016,year_2017
0,53839,frye,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
1,53841,per_curiam,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
2,53843,parker_sarah,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
3,53847,whichard,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
4,53848,webb,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0


### Normalize the Continuous Variables

The dataset has a number of continuous variables that I need to normalize, including the word count, sentence count, and average sentence lenghth.  Also, the polarity scores from `TextBlob` are on a -1-to-1 scale, so I will convert those values into a normalized 0-to-1 range.

In [15]:
# Import and instantiate MinMaxScaler
from sklearn.preprocessing import MinMaxScaler
min_max_scaler = MinMaxScaler()

# Normalize the relevant features
x = df.iloc[:, 8:12].values
x_scaled = min_max_scaler.fit_transform(x)

# Save the normalized values into a dataframe
df_scaled = pd.DataFrame(x_scaled, columns=['word_count', 'sentence_count', 'avg_sent_length', 'polarity'])
df_scaled.head()

Unnamed: 0,word_count,sentence_count,avg_sent_length,polarity
0,0.088804,0.098247,0.093527,0.501317
1,0.000265,0.000923,0.018977,0.444444
2,0.192718,0.186347,0.108294,0.437718
3,0.124379,0.123155,0.105457,0.430596
4,0.027348,0.034133,0.081419,0.473283


In [17]:
# add the normalized features to the df_ml dataframe
df_ml = pd.concat([df_ml, df_scaled], axis=1)

# add the already-normalized subjectivity column to the dl_ml dataframe
df_ml['subjectivity'] = df.loc[:, 'subjectivity']

df_ml.head()

Unnamed: 0,id,judge,type_concurrence,type_concurring-in-part-and-dissenting-in-part,type_dissent,type_majority,type_rehearing,year_1779,year_1784,year_1787,year_1789,year_1790,year_1791,year_1792,year_1793,year_1794,year_1795,year_1796,year_1797,year_1798,year_1799,year_1800,year_1801,year_1802,year_1803,year_1804,year_1805,year_1806,year_1807,year_1808,year_1809,year_1810,year_1811,year_1812,year_1813,year_1814,year_1815,year_1816,year_1817,year_1818,year_1819,year_1820,year_1821,year_1822,year_1823,year_1824,year_1825,year_1826,year_1827,year_1828,year_1829,year_1830,year_1831,year_1832,year_1833,year_1834,year_1835,year_1836,year_1837,year_1838,year_1839,year_1840,year_1841,year_1842,year_1843,year_1844,year_1845,year_1846,year_1847,year_1848,year_1849,year_1850,year_1851,year_1852,year_1853,year_1854,year_1855,year_1856,year_1857,year_1858,year_1859,year_1860,year_1861,year_1862,year_1863,year_1864,year_1866,year_1867,year_1868,year_1869,year_1870,year_1871,year_1872,year_1873,year_1874,year_1875,year_1876,year_1877,year_1878,year_1879,year_1880,year_1881,year_1882,year_1883,year_1884,year_1885,year_1886,year_1887,year_1888,year_1889,year_1890,year_1891,year_1892,year_1893,year_1894,year_1895,year_1896,year_1897,year_1898,year_1899,year_1900,year_1901,year_1902,year_1903,year_1904,year_1905,year_1906,year_1907,year_1908,year_1909,year_1910,year_1911,year_1912,year_1913,year_1914,year_1915,year_1916,year_1917,year_1918,year_1919,year_1920,year_1921,year_1922,year_1923,year_1924,year_1925,year_1926,year_1927,year_1928,year_1929,year_1930,year_1931,year_1932,year_1933,year_1934,year_1935,year_1936,year_1937,year_1938,year_1939,year_1940,year_1941,year_1942,year_1943,year_1944,year_1945,year_1946,year_1947,year_1948,year_1949,year_1950,year_1951,year_1952,year_1953,year_1954,year_1955,year_1956,year_1957,year_1958,year_1959,year_1960,year_1961,year_1962,year_1963,year_1964,year_1965,year_1966,year_1967,year_1968,year_1969,year_1970,year_1971,year_1972,year_1973,year_1974,year_1975,year_1976,year_1977,year_1978,year_1979,year_1980,year_1981,year_1982,year_1983,year_1984,year_1985,year_1986,year_1987,year_1988,year_1989,year_1990,year_1991,year_1992,year_1993,year_1994,year_1995,year_1996,year_1997,year_1998,year_1999,year_2000,year_2001,year_2002,year_2003,year_2004,year_2005,year_2006,year_2007,year_2008,year_2009,year_2010,year_2011,year_2012,year_2013,year_2014,year_2015,year_2016,year_2017,word_count,sentence_count,avg_sent_length,polarity,word_count.1,sentence_count.1,avg_sent_length.1,polarity.1,subjectivity
0,53839,frye,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0.088804,0.098247,0.093527,0.501317,0.088804,0.098247,0.093527,0.501317,0.473721
1,53841,per_curiam,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0.000265,0.000923,0.018977,0.444444,0.000265,0.000923,0.018977,0.444444,0.0
2,53843,parker_sarah,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0.192718,0.186347,0.108294,0.437718,0.192718,0.186347,0.108294,0.437718,0.321622
3,53847,whichard,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0.124379,0.123155,0.105457,0.430596,0.124379,0.123155,0.105457,0.430596,0.415121
4,53848,webb,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0.027348,0.034133,0.081419,0.473283,0.027348,0.034133,0.081419,0.473283,0.415155


In [18]:
# save the df_ml dataframe to the local drive
export_df = df_ml.to_csv('df_ml_1.csv')

## Training Some Benchmark Models

In [36]:
df_ml2 = df_ml[df_ml['judge'] != 'per_curiam']
df_ml2.info()

<class 'pandas.core.frame.DataFrame'>
Int64Index: 74309 entries, 0 to 80747
Columns: 247 entries, id to subjectivity
dtypes: float64(9), int64(1), object(1), uint8(236)
memory usage: 23.5+ MB


In [45]:
# divide the data into label and features for use in ml models
X = df_ml.iloc[:, 2:]
y = df_ml.loc[:, 'judge']
X.columns

Index(['type_concurrence', 'type_concurring-in-part-and-dissenting-in-part',
       'type_dissent', 'type_majority', 'type_rehearing', 'year_1779',
       'year_1784', 'year_1787', 'year_1789', 'year_1790',
       ...
       'year_2017', 'word_count', 'sentence_count', 'avg_sent_length',
       'polarity', 'word_count', 'sentence_count', 'avg_sent_length',
       'polarity', 'subjectivity'],
      dtype='object', length=245)

In [47]:
# split the data into training and test sets
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X, y, random_state=6, stratify=y)

### Logistic Regression

In [48]:
from sklearn.linear_model import LogisticRegression
lr = LogisticRegression(random_state=6)
lr.fit(X_train, y_train)
y_pred_lr = lr.predict(X_test)
print('Accuracy on training set = {}'.format(lr.score(X_train, y_train)))
print('Accuracy on test set = {}'.format(lr.score(X_test, y_test)))



Accuracy on training set = 0.28983999603705357
Accuracy on test set = 0.2698632851198732


### Stochastic Gradient Descent Classifier

In [49]:
from sklearn.linear_model import SGDClassifier
sgdc = SGDClassifier(random_state=6)
sgdc.fit(X_train, y_train)
y_pred_sgdc = sgdc.predict(X_test)
print('Accuracy on training set = {}'.format(sgdc.score(X_train, y_train)))
print('Accuracy on test set = {}'.format(sgdc.score(X_test, y_test)))

Accuracy on training set = 0.23133699905880023
Accuracy on test set = 0.22359817713493163


### Linear Support Vector Machine Classifier

In [50]:
from sklearn.svm import LinearSVC
svc = LinearSVC(random_state=6)
svc.fit(X_train, y_train)
y_pred_svc = svc.predict(X_test)
print('Accuracy on training set = {}'.format(svc.score(X_train, y_train)))
print('Accuracy on test set = {}'.format(svc.score(X_test, y_test)))

Accuracy on training set = 0.302174666864814
Accuracy on test set = 0.2855656825837131


### Multinomial Naive Bayes

In [51]:
from sklearn.naive_bayes import MultinomialNB
nb = MultinomialNB()
nb.fit(X_train, y_train)
y_pred_nb = nb.predict(X_test)
print('Accuracy on training set = {}'.format(nb.score(X_train, y_train)))
print('Accuracy on test set = {}'.format(nb.score(X_test, y_test)))

Accuracy on training set = 0.2104324565314311
Accuracy on test set = 0.20001981375074301


### Random Forest Classifier

In [52]:
from sklearn.ensemble import RandomForestClassifier
rf = RandomForestClassifier(random_state=6)
rf.fit(X_train, y_train)
y_pred_rf = rf.predict(X_test)
print('Accuracy on training set = {}'.format(rf.score(X_train, y_train)))
print('Accuracy on test set = {}'.format(rf.score(X_test, y_test)))



Accuracy on training set = 0.9885404798467661
Accuracy on test set = 0.24148008718050326


### Summary

Here is a summary of the results:

| Model               | Train  | Test   |
|---------------------|--------|--------|
| Linear SVM          | 0.3022 | 0.2856 |
| Logistic Regression | 0.2898 | 0.2699 |
| Random Forest       | 0.9885 | 0.2415 |
| SGD                 | 0.2313 | 0.2234 |
| Naive Bayes         | 0.2104 | 0.2000 |

The best performing models were able to correctly predict the judge who wrote the opinion just over 25% of the time on the test set.  The top three performing models also appear to have overfit on the training set (the Random Forest Classifier to a much more significant degree).  While this is far from our ultimate goal, it does demonstrate that these features contain some signal.

Next, I will use `TfidfVectorizer` to vectorize the text of the opinions into additional features to see if it improves model performance.

## Vectorized Text Features

As discussed in the Data Preprocessing notebook, I have already used `TfidfVectorizer` on the corpus with varying `min_df` values to build vocabularies.  I will read in the vectorized text files and see how the models perform.  I will first test the smallest vocabulary of 3,677 words/features. 

In [3]:
# read in the saved tfidf vectors from local .csv file
df_tfidf = pd.read_csv('df_tfidf_02.csv', index_col=0)

In [5]:
df_tfidf.head()

Unnamed: 0,00,000,10,100,101,102,103,104,105,106,107,108,109,11,110,111,112,113,114,115,116,117,118,119,12,120,121,122,123,124,125,126,127,128,129,13,130,131,132,133,134,135,136,137,138,139,14,140,141,142,143,144,145,146,147,148,149,15,150,151,152,153,154,155,156,157,158,159,15a,16,160,161,162,163,164,165,166,167,168,169,17,170,171,172,173,174,175,176,177,178,179,18,180,181,182,183,184,185,186,187,188,189,19,190,191,192,193,194,195,1950,1951,1952,1953,1954,1955,1956,1957,1958,1959,196,1960,1961,1962,1963,1964,1965,1966,1967,1968,1969,197,1970,1971,1972,1973,1974,1975,1976,1977,1978,1979,198,1980,1981,1982,1983,1984,1985,1986,1987,1988,1989,199,1990,1991,1992,1993,1994,1995,1996,1997,1998,1999,1a,1st,20,200,2000,2001,2002,2003,2004,2005,2006,2007,2008,2009,201,2010,202,203,204,205,206,207,208,209,21,210,211,212,213,214,215,216,217,218,219,22,220,221,222,223,224,225,226,227,228,229,23,230,231,232,233,234,235,236,237,238,239,24,240,241,242,243,244,245,246,247,248,249,25,250,251,252,253,254,255,256,257,258,259,26,260,261,262,263,264,265,266,...,threat,threatened,three,through,throughout,thus,time,timely,times,tion,tire,title,to,today,together,told,too,took,top,tort,total,totally,toward,towards,town,track,tract,trade,traffic,train,training,transaction,transactions,transcript,transfer,transferred,transportation,travel,traveling,treat,treated,treating,treatment,trespass,trial,trials,tried,trouble,truck,true,trust,trustee,trustees,truth,try,trying,turn,turned,turner,twelve,twenty,twice,two,type,types,ultimate,ultimately,unable,uncertain,unconstitutional,uncontradicted,under,underlying,understand,understanding,understood,undertake,undertaking,undertook,undisputed,undoubtedly,undue,unfair,uniform,union,unit,united,university,unjust,unknown,unlawful,unlawfully,unless,unlike,unnecessary,unreasonable,unsupported,until,unusual,up,upheld,upon,urged,us,use,used,uses,using,usual,usually,va,vacate,vacated,valid,validity,valuable,value,variance,various,vehicle,vehicles,venire,verdict,verdicts,verified,version,very,vested,vi,victim,view,viewed,views,violate,violated,violates,violating,violation,violations,violence,virginia,virtue,visit,void,voir,vol,voluntarily,voluntary,vote,waive,waived,waiver,wake,walked,walking,wall,want,wanted,ward,warning,warrant,warranted,warrants,warranty,was,washington,water,watson,way,ways,wbicb,we,weapon,week,weeks,weigh,weight,welfare,well,went,were,west,what,whatever,whatsoever,when,whenever,where,whereas,whereby,wherein,whether,which,while,white,who,whole,wholly,whom,whose,why,wide,widow,wife,will,willful,willfully,william,willing,wilmington,wilson,window,winston,wish,wit,with,withdraw,within,without,withstand,witness,witnesses,woman,wood,word,words,work,worked,worker,workers,working,works,worth,would,wright,writ,writing,written,wrong,wrongful,wrongfully,wrote,year,years,yes,yet,york,you,young,your
0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.006115,0.0,0.017759,0.0,0.0,0.0,0.017337,0.0,0.008272,0.0,0.0,0.011754,0.006137,0.01231,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.006142,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.018073,0.005839,0.0,0.0,0.0,0.0,0.0,0.004249,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.052304,0.004528,0.0,0.005937,0.0,0.0,0.0,0.00617,0.0,0.0,0.0,0.0,0.004638,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.009079,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.009345,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.016766,0.0,0.0,0.0,0.0,0.00531,0.0,0.005212,0.005344,0.0,0.005347,0.005651,0.0,0.005824,0.006144,0.0,0.005732,0.005868,0.00584,0.11641,0.059592,0.030187,0.0,0.0,0.0,0.0,0.0,0.029396,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.006256,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.009274,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.004874,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.004841,0.0,0.005908,0.0,0.006086,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.019289,0.0,0.0,0.0,0.0,0.009069,0.0,0.0,0.0,0.0,0.0,0.0,0.006368,0.0,0.006503,0.0,0.0,0.006443,0.0,0.0,0.0,0.0,0.0,0.0,...,0.007668,0.0,0.0,0.0,0.0,0.022929,0.004798,0.0,0.0,0.0,0.0,0.0,0.187782,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.009686,0.0,0.0,0.0,0.0,0.00386,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.006326,0.005294,0.0,0.011254,0.0,0.0,0.0,0.0,0.0,0.0,0.027642,0.0,0.015266,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.013997,0.0,0.0,0.0,0.016883,0.0,0.0,0.0,0.005466,0.0,0.0,0.003559,0.0,0.010877,0.005882,0.016425,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.004766,0.005484,0.0,0.0,0.0,0.00541,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.007436,0.006515,0.010877,0.027885,0.007414,0.018138,0.0,0.0,0.0,0.0,0.0,0.005311,0.0,0.0,0.0,0.0,0.013854,0.0,0.0,0.0,0.020446,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.053208,0.0,0.0,0.0,0.0,0.0,0.0,0.057681,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.011127,0.0,0.0,0.0,0.0,0.010918,0.0,0.005309,0.0,0.0,0.0,0.042082,0.018303,0.009107,0.00562,0.002685,0.044954,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.017038,0.0,0.006023,0.007699,0.0,0.0,0.009313,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.008871,0.0,0.006239,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.007653,0.0,0.0,0.020184,0.0,0.0,0.0
1,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
2,0.002442,0.017602,0.005822,0.0,0.0,0.0,0.003096,0.0,0.0,0.0,0.0,0.0,0.0,0.002189,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.002969,0.0,0.00304,0.002079,0.0,0.0,0.0,0.003085,0.0,0.0,0.0,0.0,0.0,0.0,0.027079,0.0,0.003006,0.0,0.0,0.0,0.006175,0.002944,0.0,0.0,0.0,0.004202,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.008542,0.0,0.0,0.003063,0.002971,0.0,0.0,0.0,0.003086,0.0,0.0,0.014606,0.027311,0.002995,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.002331,0.003059,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.002934,0.0,0.0,0.003006,0.0,0.0,0.0,0.0,0.0,0.009014,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.003839,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.005639,0.0,0.0,0.0,0.002774,0.005618,0.005482,0.005344,0.0062,0.007777,0.002669,0.005307,0.00524,0.0,0.005327,0.005376,0.00284,0.014005,0.002927,0.0,0.0,0.020167,0.050139,0.011741,0.008777,0.011982,0.006069,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.003287,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.003099,0.0,0.0,0.0,0.003113,0.0,0.0,0.0,0.002433,0.003126,0.00594,0.0,0.0,0.0,0.003088,0.0,0.0,0.0,0.003174,0.0,0.009047,0.00304,0.0,0.0,0.0,0.0,0.003231,0.0,0.006453,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.003665,0.003868,0.0,0.019757,0.019295,0.0,0.00999,0.0,0.003755,0.0,0.243296,0.0,0.014548,0.059761,0.0,0.008492,0.0,0.0,0.0,0.003812,0.0,0.003348,0.002964,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.015926,0.0,0.0,0.0,0.0,0.0,0.0,0.005705,0.0,0.0,0.0,0.073032,0.0,0.004812,0.0,0.003297,0.00194,0.007575,0.0,0.0,0.0,0.003383,0.0,0.00277,0.0,0.0,0.0,0.005322,0.003423,0.016971,0.002947,0.0,0.0,0.0,0.002552,0.0,0.003474,0.0,0.016444,0.0,0.0,0.006173,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.003403,0.0,0.0,0.0,0.007035,0.0,0.0,0.0,0.0,0.003486,0.001775,0.0,0.0,0.0,0.0,0.003578,0.0,0.003645,0.0,0.003096,0.003635,0.00184,0.021757,0.005455,0.0,0.0,0.0,0.0,0.0,0.0,0.014283,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.005947,0.0,0.0,0.0,0.010273,0.0,0.0,0.263759,0.0,0.0,0.0,0.00655,0.021869,0.0,0.0,0.002279,0.0,0.0,0.0,0.0,0.0,0.0,0.003462,0.0,0.012092,0.0,0.0,0.0,0.002863,0.006502,0.0,0.0,0.0,0.0,0.002739,0.015687,0.0,0.0,0.0,0.0,0.003859,0.0,0.09749,0.0,0.0,0.0,0.001927,0.0,0.0,0.037991,0.0,0.0,0.003259,0.0,0.0,0.0,0.009572,0.007075,0.031321,0.0,0.00469,0.0,0.0,0.009878,0.0,0.008006,0.0,0.0,0.0,0.016176,0.0138,0.003052,0.0,0.008099,0.0,0.0,0.0,0.0,0.015291,0.00342,0.003719,0.021804,0.001275,0.0,0.003672,0.0,0.0,0.0,0.008996,0.00384,0.0,0.0,0.0,0.048532,0.0,0.003027,0.00774,0.0,0.023139,0.018724,0.0,0.0,0.0,0.0,0.007014,0.0,0.0,0.003661,0.0,0.0,0.0,0.033444,0.0,0.0,0.0,0.002258,0.0,0.0,0.0,0.0,0.002258,0.003847,0.0,0.0,0.0,0.011282,0.0,0.0
3,0.0,0.004179,0.009676,0.004278,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.005137,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.009819,0.0,0.0,0.0,0.0,0.0,0.004998,0.0,0.00375,0.014608,0.009991,0.004744,0.004931,0.02001,0.005131,0.004892,0.0,0.0,0.0,0.003492,0.0,0.0,0.005029,0.005032,0.0,0.0,0.005079,0.005121,0.0,0.0,0.003549,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.005129,0.00488,0.0,0.072821,0.0,0.0,0.0,0.0,0.005014,0.0,0.0,0.0,0.0,0.0,0.0,0.003874,0.005084,0.0,0.0,0.0,0.00998,0.0,0.005038,0.0,0.0,0.0,0.007584,0.0,0.004991,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.004907,0.003903,0.0,0.00501,0.0,0.0,0.005205,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.004556,0.0,0.005152,0.0,0.004435,0.0,0.008708,0.004464,0.004426,0.004467,0.00472,0.018621,0.0,0.0,0.018924,0.009576,0.004902,0.004878,0.058347,0.039825,0.055478,0.0,0.0,0.0,0.0,0.0,0.003508,0.0,0.05277,0.0,0.015088,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.005128,0.0,0.0,0.0,0.0,0.0,0.010486,0.0,0.0,0.0,0.0,0.0,0.0,0.005102,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.005312,0.003788,0.0,0.0,0.0,0.0,0.0,0.0,0.010639,0.0,0.0,0.0,0.004172,0.0,0.0,0.0,0.005278,0.0,0.0,0.0,...,0.0,0.0,0.036542,0.003214,0.0,0.010945,0.012025,0.0,0.0,0.0,0.0,0.0,0.255585,0.0,0.0,0.0191,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.010587,0.0,0.0,0.0,0.0,0.0,0.005815,0.00474,0.0,0.005166,0.0,0.113281,0.0,0.003998,0.0,0.0,0.003225,0.0,0.0,0.0,0.004892,0.005622,0.0,0.009206,0.0,0.0,0.0,0.013268,0.0,0.039956,0.0,0.0,0.0,0.0,0.004241,0.0,0.011546,0.006353,0.014575,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.005655,0.0,0.0,0.0,0.003897,0.0,0.0,0.0,0.0,0.0,0.002949,0.0,0.0,0.0,0.00591,0.0,0.023181,0.003029,0.0,0.008575,0.006041,0.003058,0.012053,0.0,0.0,0.004657,0.0,0.0,0.0,0.005719,0.004747,0.003981,0.0,0.0,0.033907,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.019701,0.0,0.0,0.012424,0.010885,0.009086,0.0,0.0,0.011364,0.0,0.01177,0.0,0.0,0.0,0.0,0.023016,0.0,0.0,0.0,0.0,0.005918,0.0,0.005403,0.0,0.0,0.0,0.0,0.004553,0.005214,0.0,0.0,0.004234,0.00588,0.0,0.0,0.07169,0.0,0.0,0.0,0.0,0.0,0.0,0.054831,0.005187,0.0,0.0,0.019312,0.008513,0.0,0.015908,0.007838,0.03718,0.0,0.018186,0.0,0.006259,0.016416,0.0,0.013305,0.0,0.0,0.0,0.035154,0.033637,0.017751,0.00939,0.017946,0.0,0.0,0.0,0.0,0.0,0.011368,0.0,0.0,0.012718,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.033211,0.0,0.005031,0.019294,0.0,0.0,0.00389,0.017047,0.0,0.00468,0.0,0.0,0.0,0.00645,0.0,0.0,0.0,0.0,0.014822,0.006043,0.0,0.0,0.0,0.004966,0.0,0.0,0.0,0.003753,0.012786,0.004921,0.0,0.0,0.0375,0.005292,0.004544
4,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.119991,0.0,0.0,0.014679,0.0,0.0,0.015228,0.0,0.0,0.0,0.0,0.0,0.0,0.015045,0.0,0.0,0.021065,0.0,0.0,0.0,0.0,0.0,0.030301,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.015031,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.015285,0.0,0.0,0.0,0.0,0.015628,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01156,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.018999,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.014233,0.0,0.013539,0.0,0.026269,0.013522,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.014597,0.014943,0.044616,0.0,0.015176,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.078174,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.046665,0.0,0.0,0.0,0.0,0.012413,0.0,0.016045,0.0,0.0,0.0,0.0,0.0,0.0,0.016078,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.015667,0.0,0.0,0.0,0.0,0.0,0.0,0.031139,0.016152,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.031685,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.009283,0.0,0.0,0.008342,0.0,0.0,0.0,0.0,0.0,0.0,0.123679,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.263985,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.035646,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.014473,0.012138,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.016592,0.01385,0.0,0.0,0.0,0.0,0.0,0.018191,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.034969,0.0,0.0,0.0,0.0,0.0,0.0,0.096243,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.011335,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.012608,0.03729,0.0,0.0,0.0,0.011448,0.0,0.0,0.0,0.012912,0.0,0.0,0.0,0.025847,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.014464,0.0,0.007669,0.019607,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.016945,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0


In [4]:
# create the machine learing dataframe
df_ml = df.loc[:, ['id', 'judge']]
df_ml.head()

Unnamed: 0,id,judge
0,53839,frye
1,53841,per_curiam
2,53843,parker_sarah
3,53847,whichard
4,53848,webb


In [7]:
# add the vectorized vocabulary to the df_ml dataframe
df_ml = pd.concat([df_ml, df_tfidf], axis=1)

In [8]:
df_ml.head()

Unnamed: 0,id,judge,00,000,10,100,101,102,103,104,105,106,107,108,109,11,110,111,112,113,114,115,116,117,118,119,12,120,121,122,123,124,125,126,127,128,129,13,130,131,132,133,134,135,136,137,138,139,14,140,141,142,143,144,145,146,147,148,149,15,150,151,152,153,154,155,156,157,158,159,15a,16,160,161,162,163,164,165,166,167,168,169,17,170,171,172,173,174,175,176,177,178,179,18,180,181,182,183,184,185,186,187,188,189,19,190,191,192,193,194,195,1950,1951,1952,1953,1954,1955,1956,1957,1958,1959,196,1960,1961,1962,1963,1964,1965,1966,1967,1968,1969,197,1970,1971,1972,1973,1974,1975,1976,1977,1978,1979,198,1980,1981,1982,1983,1984,1985,1986,1987,1988,1989,199,1990,1991,1992,1993,1994,1995,1996,1997,1998,1999,1a,1st,20,200,2000,2001,2002,2003,2004,2005,2006,2007,2008,2009,201,2010,202,203,204,205,206,207,208,209,21,210,211,212,213,214,215,216,217,218,219,22,220,221,222,223,224,225,226,227,228,229,23,230,231,232,233,234,235,236,237,238,239,24,240,241,242,243,244,245,246,247,248,249,25,250,251,252,253,254,255,256,257,258,259,26,260,261,262,263,264,...,threat,threatened,three,through,throughout,thus,time,timely,times,tion,tire,title,to,today,together,told,too,took,top,tort,total,totally,toward,towards,town,track,tract,trade,traffic,train,training,transaction,transactions,transcript,transfer,transferred,transportation,travel,traveling,treat,treated,treating,treatment,trespass,trial,trials,tried,trouble,truck,true,trust,trustee,trustees,truth,try,trying,turn,turned,turner,twelve,twenty,twice,two,type,types,ultimate,ultimately,unable,uncertain,unconstitutional,uncontradicted,under,underlying,understand,understanding,understood,undertake,undertaking,undertook,undisputed,undoubtedly,undue,unfair,uniform,union,unit,united,university,unjust,unknown,unlawful,unlawfully,unless,unlike,unnecessary,unreasonable,unsupported,until,unusual,up,upheld,upon,urged,us,use,used,uses,using,usual,usually,va,vacate,vacated,valid,validity,valuable,value,variance,various,vehicle,vehicles,venire,verdict,verdicts,verified,version,very,vested,vi,victim,view,viewed,views,violate,violated,violates,violating,violation,violations,violence,virginia,virtue,visit,void,voir,vol,voluntarily,voluntary,vote,waive,waived,waiver,wake,walked,walking,wall,want,wanted,ward,warning,warrant,warranted,warrants,warranty,was,washington,water,watson,way,ways,wbicb,we,weapon,week,weeks,weigh,weight,welfare,well,went,were,west,what,whatever,whatsoever,when,whenever,where,whereas,whereby,wherein,whether,which,while,white,who,whole,wholly,whom,whose,why,wide,widow,wife,will,willful,willfully,william,willing,wilmington,wilson,window,winston,wish,wit,with,withdraw,within,without,withstand,witness,witnesses,woman,wood,word,words,work,worked,worker,workers,working,works,worth,would,wright,writ,writing,written,wrong,wrongful,wrongfully,wrote,year,years,yes,yet,york,you,young,your
0,53839,frye,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.006115,0.0,0.017759,0.0,0.0,0.0,0.017337,0.0,0.008272,0.0,0.0,0.011754,0.006137,0.01231,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.006142,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.018073,0.005839,0.0,0.0,0.0,0.0,0.0,0.004249,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.052304,0.004528,0.0,0.005937,0.0,0.0,0.0,0.00617,0.0,0.0,0.0,0.0,0.004638,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.009079,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.009345,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.016766,0.0,0.0,0.0,0.0,0.00531,0.0,0.005212,0.005344,0.0,0.005347,0.005651,0.0,0.005824,0.006144,0.0,0.005732,0.005868,0.00584,0.11641,0.059592,0.030187,0.0,0.0,0.0,0.0,0.0,0.029396,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.006256,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.009274,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.004874,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.004841,0.0,0.005908,0.0,0.006086,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.019289,0.0,0.0,0.0,0.0,0.009069,0.0,0.0,0.0,0.0,0.0,0.0,0.006368,0.0,0.006503,0.0,0.0,0.006443,0.0,0.0,0.0,0.0,...,0.007668,0.0,0.0,0.0,0.0,0.022929,0.004798,0.0,0.0,0.0,0.0,0.0,0.187782,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.009686,0.0,0.0,0.0,0.0,0.00386,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.006326,0.005294,0.0,0.011254,0.0,0.0,0.0,0.0,0.0,0.0,0.027642,0.0,0.015266,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.013997,0.0,0.0,0.0,0.016883,0.0,0.0,0.0,0.005466,0.0,0.0,0.003559,0.0,0.010877,0.005882,0.016425,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.004766,0.005484,0.0,0.0,0.0,0.00541,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.007436,0.006515,0.010877,0.027885,0.007414,0.018138,0.0,0.0,0.0,0.0,0.0,0.005311,0.0,0.0,0.0,0.0,0.013854,0.0,0.0,0.0,0.020446,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.053208,0.0,0.0,0.0,0.0,0.0,0.0,0.057681,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.011127,0.0,0.0,0.0,0.0,0.010918,0.0,0.005309,0.0,0.0,0.0,0.042082,0.018303,0.009107,0.00562,0.002685,0.044954,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.017038,0.0,0.006023,0.007699,0.0,0.0,0.009313,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.008871,0.0,0.006239,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.007653,0.0,0.0,0.020184,0.0,0.0,0.0
1,53841,per_curiam,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
2,53843,parker_sarah,0.002442,0.017602,0.005822,0.0,0.0,0.0,0.003096,0.0,0.0,0.0,0.0,0.0,0.0,0.002189,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.002969,0.0,0.00304,0.002079,0.0,0.0,0.0,0.003085,0.0,0.0,0.0,0.0,0.0,0.0,0.027079,0.0,0.003006,0.0,0.0,0.0,0.006175,0.002944,0.0,0.0,0.0,0.004202,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.008542,0.0,0.0,0.003063,0.002971,0.0,0.0,0.0,0.003086,0.0,0.0,0.014606,0.027311,0.002995,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.002331,0.003059,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.002934,0.0,0.0,0.003006,0.0,0.0,0.0,0.0,0.0,0.009014,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.003839,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.005639,0.0,0.0,0.0,0.002774,0.005618,0.005482,0.005344,0.0062,0.007777,0.002669,0.005307,0.00524,0.0,0.005327,0.005376,0.00284,0.014005,0.002927,0.0,0.0,0.020167,0.050139,0.011741,0.008777,0.011982,0.006069,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.003287,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.003099,0.0,0.0,0.0,0.003113,0.0,0.0,0.0,0.002433,0.003126,0.00594,0.0,0.0,0.0,0.003088,0.0,0.0,0.0,0.003174,0.0,0.009047,0.00304,0.0,0.0,0.0,0.0,0.003231,0.0,0.006453,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.003665,0.003868,0.0,0.019757,0.019295,0.0,0.00999,0.0,0.003755,0.0,0.243296,0.0,0.014548,0.059761,0.0,0.008492,0.0,0.0,0.0,0.003812,0.0,0.003348,0.002964,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.015926,0.0,0.0,0.0,0.0,0.0,0.0,0.005705,0.0,0.0,0.0,0.073032,0.0,0.004812,0.0,0.003297,0.00194,0.007575,0.0,0.0,0.0,0.003383,0.0,0.00277,0.0,0.0,0.0,0.005322,0.003423,0.016971,0.002947,0.0,0.0,0.0,0.002552,0.0,0.003474,0.0,0.016444,0.0,0.0,0.006173,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.003403,0.0,0.0,0.0,0.007035,0.0,0.0,0.0,0.0,0.003486,0.001775,0.0,0.0,0.0,0.0,0.003578,0.0,0.003645,0.0,0.003096,0.003635,0.00184,0.021757,0.005455,0.0,0.0,0.0,0.0,0.0,0.0,0.014283,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.005947,0.0,0.0,0.0,0.010273,0.0,0.0,0.263759,0.0,0.0,0.0,0.00655,0.021869,0.0,0.0,0.002279,0.0,0.0,0.0,0.0,0.0,0.0,0.003462,0.0,0.012092,0.0,0.0,0.0,0.002863,0.006502,0.0,0.0,0.0,0.0,0.002739,0.015687,0.0,0.0,0.0,0.0,0.003859,0.0,0.09749,0.0,0.0,0.0,0.001927,0.0,0.0,0.037991,0.0,0.0,0.003259,0.0,0.0,0.0,0.009572,0.007075,0.031321,0.0,0.00469,0.0,0.0,0.009878,0.0,0.008006,0.0,0.0,0.0,0.016176,0.0138,0.003052,0.0,0.008099,0.0,0.0,0.0,0.0,0.015291,0.00342,0.003719,0.021804,0.001275,0.0,0.003672,0.0,0.0,0.0,0.008996,0.00384,0.0,0.0,0.0,0.048532,0.0,0.003027,0.00774,0.0,0.023139,0.018724,0.0,0.0,0.0,0.0,0.007014,0.0,0.0,0.003661,0.0,0.0,0.0,0.033444,0.0,0.0,0.0,0.002258,0.0,0.0,0.0,0.0,0.002258,0.003847,0.0,0.0,0.0,0.011282,0.0,0.0
3,53847,whichard,0.0,0.004179,0.009676,0.004278,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.005137,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.009819,0.0,0.0,0.0,0.0,0.0,0.004998,0.0,0.00375,0.014608,0.009991,0.004744,0.004931,0.02001,0.005131,0.004892,0.0,0.0,0.0,0.003492,0.0,0.0,0.005029,0.005032,0.0,0.0,0.005079,0.005121,0.0,0.0,0.003549,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.005129,0.00488,0.0,0.072821,0.0,0.0,0.0,0.0,0.005014,0.0,0.0,0.0,0.0,0.0,0.0,0.003874,0.005084,0.0,0.0,0.0,0.00998,0.0,0.005038,0.0,0.0,0.0,0.007584,0.0,0.004991,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.004907,0.003903,0.0,0.00501,0.0,0.0,0.005205,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.004556,0.0,0.005152,0.0,0.004435,0.0,0.008708,0.004464,0.004426,0.004467,0.00472,0.018621,0.0,0.0,0.018924,0.009576,0.004902,0.004878,0.058347,0.039825,0.055478,0.0,0.0,0.0,0.0,0.0,0.003508,0.0,0.05277,0.0,0.015088,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.005128,0.0,0.0,0.0,0.0,0.0,0.010486,0.0,0.0,0.0,0.0,0.0,0.0,0.005102,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.005312,0.003788,0.0,0.0,0.0,0.0,0.0,0.0,0.010639,0.0,0.0,0.0,0.004172,0.0,0.0,0.0,0.005278,0.0,...,0.0,0.0,0.036542,0.003214,0.0,0.010945,0.012025,0.0,0.0,0.0,0.0,0.0,0.255585,0.0,0.0,0.0191,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.010587,0.0,0.0,0.0,0.0,0.0,0.005815,0.00474,0.0,0.005166,0.0,0.113281,0.0,0.003998,0.0,0.0,0.003225,0.0,0.0,0.0,0.004892,0.005622,0.0,0.009206,0.0,0.0,0.0,0.013268,0.0,0.039956,0.0,0.0,0.0,0.0,0.004241,0.0,0.011546,0.006353,0.014575,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.005655,0.0,0.0,0.0,0.003897,0.0,0.0,0.0,0.0,0.0,0.002949,0.0,0.0,0.0,0.00591,0.0,0.023181,0.003029,0.0,0.008575,0.006041,0.003058,0.012053,0.0,0.0,0.004657,0.0,0.0,0.0,0.005719,0.004747,0.003981,0.0,0.0,0.033907,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.019701,0.0,0.0,0.012424,0.010885,0.009086,0.0,0.0,0.011364,0.0,0.01177,0.0,0.0,0.0,0.0,0.023016,0.0,0.0,0.0,0.0,0.005918,0.0,0.005403,0.0,0.0,0.0,0.0,0.004553,0.005214,0.0,0.0,0.004234,0.00588,0.0,0.0,0.07169,0.0,0.0,0.0,0.0,0.0,0.0,0.054831,0.005187,0.0,0.0,0.019312,0.008513,0.0,0.015908,0.007838,0.03718,0.0,0.018186,0.0,0.006259,0.016416,0.0,0.013305,0.0,0.0,0.0,0.035154,0.033637,0.017751,0.00939,0.017946,0.0,0.0,0.0,0.0,0.0,0.011368,0.0,0.0,0.012718,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.033211,0.0,0.005031,0.019294,0.0,0.0,0.00389,0.017047,0.0,0.00468,0.0,0.0,0.0,0.00645,0.0,0.0,0.0,0.0,0.014822,0.006043,0.0,0.0,0.0,0.004966,0.0,0.0,0.0,0.003753,0.012786,0.004921,0.0,0.0,0.0375,0.005292,0.004544
4,53848,webb,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.119991,0.0,0.0,0.014679,0.0,0.0,0.015228,0.0,0.0,0.0,0.0,0.0,0.0,0.015045,0.0,0.0,0.021065,0.0,0.0,0.0,0.0,0.0,0.030301,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.015031,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.015285,0.0,0.0,0.0,0.0,0.015628,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01156,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.018999,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.014233,0.0,0.013539,0.0,0.026269,0.013522,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.014597,0.014943,0.044616,0.0,0.015176,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.078174,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.046665,0.0,0.0,0.0,0.0,0.012413,0.0,0.016045,0.0,0.0,0.0,0.0,0.0,0.0,0.016078,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.015667,0.0,0.0,0.0,0.0,0.0,0.0,0.031139,0.016152,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.031685,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.009283,0.0,0.0,0.008342,0.0,0.0,0.0,0.0,0.0,0.0,0.123679,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.263985,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.035646,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.014473,0.012138,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.016592,0.01385,0.0,0.0,0.0,0.0,0.0,0.018191,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.034969,0.0,0.0,0.0,0.0,0.0,0.0,0.096243,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.011335,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.012608,0.03729,0.0,0.0,0.0,0.011448,0.0,0.0,0.0,0.012912,0.0,0.0,0.0,0.025847,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.014464,0.0,0.007669,0.019607,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.016945,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0


In [9]:
# divide the data into label and features for use in ml models
X = df_ml.iloc[:, 2:]
y = df_ml.loc[:, 'judge']
X.columns

Index(['00', '000', '10', '100', '101', '102', '103', '104', '105', '106',
       ...
       'wrongfully', 'wrote', 'year', 'years', 'yes', 'yet', 'york', 'you',
       'young', 'your'],
      dtype='object', length=3677)

In [11]:
# split the data into training and test sets
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X, y, random_state=6, stratify=y)

### Logistic Regression

In [12]:
from sklearn.linear_model import LogisticRegression
lr = LogisticRegression(random_state=6)
lr.fit(X_train, y_train)
y_pred_lr = lr.predict(X_test)
print('Accuracy on training set = {}'.format(lr.score(X_train, y_train)))
print('Accuracy on test set = {}'.format(lr.score(X_test, y_test)))



Accuracy on training set = 0.581562391638183
Accuracy on test set = 0.49217356845650884


### Stochastic Gradient Descent Classifier

In [13]:
from sklearn.linear_model import SGDClassifier
sgdc = SGDClassifier(random_state=6)
sgdc.fit(X_train, y_train)
y_pred_sgdc = sgdc.predict(X_test)
print('Accuracy on training set = {}'.format(sgdc.score(X_train, y_train)))
print('Accuracy on test set = {}'.format(sgdc.score(X_test, y_test)))

Accuracy on training set = 0.7290335364343389
Accuracy on test set = 0.5581038240538934


### Linear Support Vector Machine Classifier

In [14]:
from sklearn.svm import LinearSVC
svc = LinearSVC(random_state=6)
svc.fit(X_train, y_train)
y_pred_svc = svc.predict(X_test)
print('Accuracy on training set = {}'.format(svc.score(X_train, y_train)))
print('Accuracy on test set = {}'.format(svc.score(X_test, y_test)))

Accuracy on training set = 0.9444361883060055
Accuracy on test set = 0.6814444224291658


### Multinomial Naive Bayes

In [16]:
from sklearn.naive_bayes import MultinomialNB
nb = MultinomialNB()
nb.fit(X_train, y_train)
y_pred_nb = nb.predict(X_test)
print('Accuracy on training set = {}'.format(nb.score(X_train, y_train)))
print('Accuracy on test set = {}'.format(nb.score(X_test, y_test)))

Accuracy on training set = 0.23239378477898318
Accuracy on test set = 0.21111551416683177


### Random Forest Classifier

In [19]:
from sklearn.ensemble import RandomForestClassifier
rf = RandomForestClassifier(n_estimators=900, max_depth=9, min_samples_split=600, min_samples_leaf=50, random_state=6)
rf.fit(X_train, y_train)
y_pred_rf = rf.predict(X_test)
print('Accuracy on training set = {}'.format(rf.score(X_train, y_train)))
print('Accuracy on test set = {}'.format(rf.score(X_test, y_test)))

Accuracy on training set = 0.24644573240204093
Accuracy on test set = 0.22770953041410738


### Summary

Here is a summary of the results:

| Model (min_df=0.02) | Train  | Test   |
|---------------------|--------|--------|
| Linear SVM          | 0.9044 | 0.6814 |
| Logistic Regression | 0.5816 | 0.4922 |
| SGD                 | 0.7290 | 0.5581 |
| Random Forest       | 0.2464 | 0.2277 |
| Naive Bayes         | 0.2324 | 0.2111 |

Three of the classifiers performed substantially better with the vectorized text.  I will drop the Random Forest and Naive Bayes models given their inferior performance.  Let's see how the Linear SVM, Logistic Regression, and SGD models do with an expanded vocabulary of 16905 words/features:

In [3]:
# create stop words for use in vectorizer
stop_words = list(set(df['judge']))
add_stop_words = ['clark', 'johnson', 'martin', 'parker', 'timmons', 'goodson', 'walker', 'judge', 'justice', 'per', 'curiam']
stop_words += add_stop_words

In [4]:
# vectorize the corpus using TfidfVectorizer and a min_df of 0.001
from sklearn.feature_extraction.text import TfidfVectorizer

tfidf = TfidfVectorizer(stop_words=stop_words, min_df=0.001)
data_tfidf = tfidf.fit_transform(df['opinion'])
df_tfidf = pd.DataFrame(data_tfidf.toarray(), columns=tfidf.get_feature_names())
df_tfidf.index = df.index

df_tfidf.head()

Unnamed: 0,00,000,01,02,03,04,05,050,06,07,08,09,0f,10,100,1000,1001,1002,1003,1004,1005,1006,1007,1008,1009,101,1010,1011,1012,1013,1014,1015,1016,1017,1018,1019,102,1020,1021,1022,1023,1024,1025,1026,1027,1028,1029,103,1030,1031,1032,1033,1035,1036,1037,1038,1039,104,1040,1041,1042,1043,1044,1045,1046,1047,1048,1049,105,1050,1051,1052,1053,1054,1055,1056,1057,1059,106,1060,1061,1062,1063,1064,1065,1066,1067,1068,1069,107,1070,1072,1073,1074,1075,1076,1077,1078,1079,108,1080,1081,1082,1083,1084,1086,1087,1089,108a,109,1090,1091,1092,1093,1094,1095,1096,1097,1098,1099,10a,10th,11,110,1100,1101,1102,1103,1104,1105,1106,1107,1108,1109,111,1110,1111,1112,1113,1114,1117,112,1120,1122,1123,1124,1125,1126,1127,1128,1129,113,1130,1131,1133,1134,1135,1137,1138,1139,113a,114,1143,1145,1147,1148,1149,115,1152,1154,1156,1159,115c,116,1160,1161,1164,1165,1167,1169,117,1178,118,1180,1181,1183,1189,119,1194,1197,1199,11th,12,120,1200,1201,1205,1206,1207,1208,1209,121,1210,1211,1212,1213,1214,1215,122,1221,1222,1225,1226,1227,1228,1229,122c,123,1230,1231,1232,1233,1234,1235,1236,1237,124,1240,1241,1242,1243,1245,1246,1247,1249,125,1250,1251,1253,1254,126,127,128,1283,129,12th,13,130,1300,1302,...,willingness,willis,williston,willoughby,wills,wilmington,wilson,win,winbobne,winboene,winchester,wind,winders,windfall,winding,windley,window,windows,winds,windshield,wine,winfield,wing,wingo,winn,winner,winning,winslow,winstead,winston,winter,winters,wipe,wiped,wire,wired,wires,wiring,wis,wisconsin,wisdom,wise,wisely,wiseman,wiser,wisest,wish,wished,wishes,wishing,wit,witb,witbin,with,withdraw,withdrawal,withdrawals,withdrawing,withdrawn,withdraws,withdrew,withers,witherspoon,withheld,withhold,withholding,withholds,within,without,withstand,witness,witnessed,witnesses,witnessing,witt,wives,wjhere,wjhether,wl,wm,wms,wo,woke,wolf,wolfe,womack,woman,womble,women,won,wonder,wong,wood,woodard,wooded,wooden,woodfin,woodhouse,woodland,woodlief,woodmen,woodruff,woods,woodson,woodward,woody,wool,woolard,wooten,word,worded,wording,words,wore,work,workable,worked,worker,workers,working,workman,workmanlike,workmanship,workmen,workplace,works,worksheet,worland,world,worley,worn,worried,worry,worse,worsened,worship,worsley,worst,worth,wortham,worthington,worthless,worthy,would,wouldn,wound,wounded,wounding,wounds,wrapped,wras,wray,wreck,wrecked,wrecker,wrenn,wright,wrightsville,wrist,wrists,writ,write,writer,writers,writes,writing,writings,writs,written,wrong,wrongdoer,wrongdoers,wrongdoing,wronged,wrongful,wrongfully,wrongly,wrongs,wrote,wrought,wyatt,wynne,wyo,wás,wé,xi,xii,xiii,xiv,ya,yadkin,yale,yance,yancey,yarborough,yarbrough,yard,yards,yarn,yates,ye,yeah,year,yearly,years,yell,yelled,yelling,yellow,yelverton,yes,yesterday,yet,yield,yielded,yielding,yields,yii,yirginia,yol,yon,york,you,young,youngblood,younger,youngest,yount,your,yours,yourself,yourselves,youth,youthful,yow,zachary,zeal,zealous,zero,zimmerman,zone,zoned,zones,zoning,zuniga,zurich,ánd,áre,ás,óf
0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.005499,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.015969,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.015589,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.007438,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.010568,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.005518,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.011068,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01532,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.005415,0.006923,0.0,0.0,0.0,0.008374,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.007977,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.00561,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.006881,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.018148,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.008196,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
1,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
2,0.002282,0.016448,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.00544,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.002893,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.011004,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.005468,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.002046,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.002775,0.0,0.0,0.0,0.0,0.0,0.0,0.002841,0.0,0.0,0.0,0.0,0.001943,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.002882,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.025303,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.008406,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.003588,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.017213,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.011109,0.0,0.0,0.0,0.0,0.0,0.045348,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.002829,0.007232,0.0,0.021621,0.0,0.017496,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.006554,0.0,0.0,0.0,0.003421,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.03125,0.0,0.0,0.0,0.0,0.007688,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.00211,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.00211,0.0,0.003594,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.010542,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.046031,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
3,0.0,0.003825,0.0,0.0,0.006578,0.0,0.006501,0.0,0.0,0.0,0.0,0.0,0.0,0.008856,0.003916,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.008488,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.004702,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.008987,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.004574,0.0,0.0,0.0,0.003433,0.013371,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0304,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.004605,0.017661,0.0,0.0,0.0,0.003561,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.015604,0.0,0.018257,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.004284,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.005904,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.013567,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.005531,0.0,0.0,0.0,0.0,0.005981,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.004546,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.003435,0.0,0.011704,0.0,0.0,0.0,0.0,0.0,0.004504,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.034325,0.004844,0.0,0.0,0.0,0.0,0.004159,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
4,0.0,0.0,0.0,0.020194,0.0,0.0,0.019954,0.0,0.0,0.0,0.0,0.020754,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.110575,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.013527,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.014033,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.013864,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.019412,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.027923,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.013329,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.007067,0.018068,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.015615,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0


In [5]:
# create the machine learing dataframe
df_ml = df.loc[:, ['id', 'judge']]

# add the vectorized vocabulary to the df_ml dataframe
df_ml = pd.concat([df_ml, df_tfidf], axis=1)

In [6]:
# divide the data into label and features for use in ml models
X = df_ml.iloc[:, 2:]
y = df_ml.loc[:, 'judge']

# split the data into training and test sets
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X, y, random_state=6, stratify=y)

### Logistic Regression

In [7]:
from sklearn.linear_model import LogisticRegression
lr = LogisticRegression(random_state=6)
lr.fit(X_train, y_train)
y_pred_lr = lr.predict(X_test)
print('Accuracy on training set = {}'.format(lr.score(X_train, y_train)))
print('Accuracy on test set = {}'.format(lr.score(X_test, y_test)))



Accuracy on training set = 0.6063638315087267
Accuracy on test set = 0.4777095304141074


### Stochastic Gradient Descent Classifier

In [8]:
from sklearn.linear_model import SGDClassifier
sgdc = SGDClassifier(random_state=6)
sgdc.fit(X_train, y_train)
y_pred_sgdc = sgdc.predict(X_test)
print('Accuracy on training set = {}'.format(sgdc.score(X_train, y_train)))
print('Accuracy on test set = {}'.format(sgdc.score(X_test, y_test)))

Accuracy on training set = 0.8424398540314724
Accuracy on test set = 0.5780661779274817


### Linear Support Vector Machine Classifier

In [9]:
from sklearn.svm import LinearSVC
svc = LinearSVC(random_state=6)
svc.fit(X_train, y_train)

y_pred_svc = svc.predict(X_test)
print('Accuracy on training set = {}'.format(svc.score(X_train, y_train)))
print('Accuracy on test set = {}'.format(svc.score(X_test, y_test)))

Accuracy on training set = 0.9884909430161325
Accuracy on test set = 0.6752526253219735


### Summary

Here is a summary of the results:

| Model               | Train (min_df=0.02) | Test (min_df=0.02) | Train (min_df=0.001) | Test (min_df=0.001) |
|---------------------|---------------------|--------------------|----------------------|---------------------|
| Linear SVM          | 0.9044              | 0.6814             | 0.9885               | 0.6753              |
| Logistic Regression | 0.5816              | 0.4922             | 0.6064               | 0.4778              |
| SGD                 | 0.7290              | 0.5581             | 0.8424               | 0.5781              |

Expanding the vocabulary from 3677 to 16905 words did not meaningfully improve model performance.  The Linear SVM and Logistic Regression models actually performed worse on the test set and the SGD Classifier improved by only 2%.  This isn't necessarily surprising and demonstrates that expanding the vocabulary even further is not likely to improve model performance, as it is adding less-utilized, and likely less-significant, words to the vocabulary.  Expanding the vocabulary further would also open the door for more OCR errors to be included, which we know is an issue with this dataset.

For these reasons, I will proceed with the 3,677 word vocabulary, as it will make model building more efficient without sacrificing much, if any, performance.

### Combined Feature Performance

Next, I will combine the categorical and continuous variables with the tfidf-vectorized vocabulary to see how the models perform with both sets of features.

In [2]:
# load the categorical and continuous variables
df_ml = pd.read_csv('df_ml_1.csv', index_col=0)
df_ml.info()

<class 'pandas.core.frame.DataFrame'>
Int64Index: 80749 entries, 0 to 80748
Columns: 243 entries, id to subjectivity
dtypes: float64(5), int64(237), object(1)
memory usage: 150.3+ MB


In [3]:
# load the tfidf vectorized vocabulary
df_tfidf = pd.read_csv('df_tfidf_02.csv', index_col=0)
df_tfidf.info()

<class 'pandas.core.frame.DataFrame'>
Int64Index: 80749 entries, 0 to 80748
Columns: 3677 entries, 00 to your
dtypes: float64(3677)
memory usage: 2.2 GB


In [5]:
# add the vectorized vocabulary to the df_ml dataframe
df_ml = pd.concat([df_ml, df_tfidf], axis=1)
df_ml.info()

<class 'pandas.core.frame.DataFrame'>
Int64Index: 80749 entries, 0 to 80748
Columns: 3920 entries, id to your
dtypes: float64(3682), int64(237), object(1)
memory usage: 2.4+ GB


In [6]:
df_ml.head()

Unnamed: 0,id,judge,type_concurrence,type_concurring-in-part-and-dissenting-in-part,type_dissent,type_majority,type_rehearing,year_1779,year_1784,year_1787,year_1789,year_1790,year_1791,year_1792,year_1793,year_1794,year_1795,year_1796,year_1797,year_1798,year_1799,year_1800,year_1801,year_1802,year_1803,year_1804,year_1805,year_1806,year_1807,year_1808,year_1809,year_1810,year_1811,year_1812,year_1813,year_1814,year_1815,year_1816,year_1817,year_1818,year_1819,year_1820,year_1821,year_1822,year_1823,year_1824,year_1825,year_1826,year_1827,year_1828,year_1829,year_1830,year_1831,year_1832,year_1833,year_1834,year_1835,year_1836,year_1837,year_1838,year_1839,year_1840,year_1841,year_1842,year_1843,year_1844,year_1845,year_1846,year_1847,year_1848,year_1849,year_1850,year_1851,year_1852,year_1853,year_1854,year_1855,year_1856,year_1857,year_1858,year_1859,year_1860,year_1861,year_1862,year_1863,year_1864,year_1866,year_1867,year_1868,year_1869,year_1870,year_1871,year_1872,year_1873,year_1874,year_1875,year_1876,year_1877,year_1878,year_1879,year_1880,year_1881,year_1882,year_1883,year_1884,year_1885,year_1886,year_1887,year_1888,year_1889,year_1890,year_1891,year_1892,year_1893,year_1894,year_1895,year_1896,year_1897,year_1898,year_1899,year_1900,year_1901,year_1902,year_1903,year_1904,year_1905,year_1906,year_1907,year_1908,year_1909,year_1910,year_1911,year_1912,year_1913,year_1914,year_1915,year_1916,year_1917,year_1918,year_1919,year_1920,year_1921,year_1922,year_1923,year_1924,year_1925,year_1926,year_1927,year_1928,year_1929,year_1930,year_1931,year_1932,year_1933,year_1934,year_1935,year_1936,year_1937,year_1938,year_1939,year_1940,year_1941,year_1942,year_1943,year_1944,year_1945,year_1946,year_1947,year_1948,year_1949,year_1950,year_1951,year_1952,year_1953,year_1954,year_1955,year_1956,year_1957,year_1958,year_1959,year_1960,year_1961,year_1962,year_1963,year_1964,year_1965,year_1966,year_1967,year_1968,year_1969,year_1970,year_1971,year_1972,year_1973,year_1974,year_1975,year_1976,year_1977,year_1978,year_1979,year_1980,year_1981,year_1982,year_1983,year_1984,year_1985,year_1986,year_1987,year_1988,year_1989,year_1990,year_1991,year_1992,year_1993,year_1994,year_1995,year_1996,year_1997,year_1998,year_1999,year_2000,year_2001,year_2002,year_2003,year_2004,year_2005,year_2006,year_2007,year_2008,year_2009,year_2010,year_2011,year_2012,year_2013,year_2014,year_2015,year_2016,year_2017,word_count,sentence_count,avg_sent_length,polarity,subjectivity,00,000,10,100,101,102,103,...,threat,threatened,three,through,throughout,thus,time,timely,times,tion,tire,title,to,today,together,told,too,took,top,tort,total,totally,toward,towards,town,track,tract,trade,traffic,train,training,transaction,transactions,transcript,transfer,transferred,transportation,travel,traveling,treat,treated,treating,treatment,trespass,trial,trials,tried,trouble,truck,true,trust,trustee,trustees,truth,try,trying,turn,turned,turner,twelve,twenty,twice,two,type,types,ultimate,ultimately,unable,uncertain,unconstitutional,uncontradicted,under,underlying,understand,understanding,understood,undertake,undertaking,undertook,undisputed,undoubtedly,undue,unfair,uniform,union,unit,united,university,unjust,unknown,unlawful,unlawfully,unless,unlike,unnecessary,unreasonable,unsupported,until,unusual,up,upheld,upon,urged,us,use,used,uses,using,usual,usually,va,vacate,vacated,valid,validity,valuable,value,variance,various,vehicle,vehicles,venire,verdict,verdicts,verified,version,very,vested,vi,victim,view,viewed,views,violate,violated,violates,violating,violation,violations,violence,virginia,virtue,visit,void,voir,vol,voluntarily,voluntary,vote,waive,waived,waiver,wake,walked,walking,wall,want,wanted,ward,warning,warrant,warranted,warrants,warranty,was,washington,water,watson,way,ways,wbicb,we,weapon,week,weeks,weigh,weight,welfare,well,went,were,west,what,whatever,whatsoever,when,whenever,where,whereas,whereby,wherein,whether,which,while,white,who,whole,wholly,whom,whose,why,wide,widow,wife,will,willful,willfully,william,willing,wilmington,wilson,window,winston,wish,wit,with,withdraw,within,without,withstand,witness,witnesses,woman,wood,word,words,work,worked,worker,workers,working,works,worth,would,wright,writ,writing,written,wrong,wrongful,wrongfully,wrote,year,years,yes,yet,york,you,young,your
0,53839,frye,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0.088804,0.098247,0.093527,0.501317,0.473721,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.007668,0.0,0.0,0.0,0.0,0.022929,0.004798,0.0,0.0,0.0,0.0,0.0,0.187782,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.009686,0.0,0.0,0.0,0.0,0.00386,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.006326,0.005294,0.0,0.011254,0.0,0.0,0.0,0.0,0.0,0.0,0.027642,0.0,0.015266,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.013997,0.0,0.0,0.0,0.016883,0.0,0.0,0.0,0.005466,0.0,0.0,0.003559,0.0,0.010877,0.005882,0.016425,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.004766,0.005484,0.0,0.0,0.0,0.00541,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.007436,0.006515,0.010877,0.027885,0.007414,0.018138,0.0,0.0,0.0,0.0,0.0,0.005311,0.0,0.0,0.0,0.0,0.013854,0.0,0.0,0.0,0.020446,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.053208,0.0,0.0,0.0,0.0,0.0,0.0,0.057681,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.011127,0.0,0.0,0.0,0.0,0.010918,0.0,0.005309,0.0,0.0,0.0,0.042082,0.018303,0.009107,0.00562,0.002685,0.044954,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.017038,0.0,0.006023,0.007699,0.0,0.0,0.009313,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.008871,0.0,0.006239,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.007653,0.0,0.0,0.020184,0.0,0.0,0.0
1,53841,per_curiam,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0.000265,0.000923,0.018977,0.444444,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
2,53843,parker_sarah,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0.192718,0.186347,0.108294,0.437718,0.321622,0.002442,0.017602,0.005822,0.0,0.0,0.0,0.003096,...,0.0,0.0,0.003665,0.003868,0.0,0.019757,0.019295,0.0,0.00999,0.0,0.003755,0.0,0.243296,0.0,0.014548,0.059761,0.0,0.008492,0.0,0.0,0.0,0.003812,0.0,0.003348,0.002964,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.015926,0.0,0.0,0.0,0.0,0.0,0.0,0.005705,0.0,0.0,0.0,0.073032,0.0,0.004812,0.0,0.003297,0.00194,0.007575,0.0,0.0,0.0,0.003383,0.0,0.00277,0.0,0.0,0.0,0.005322,0.003423,0.016971,0.002947,0.0,0.0,0.0,0.002552,0.0,0.003474,0.0,0.016444,0.0,0.0,0.006173,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.003403,0.0,0.0,0.0,0.007035,0.0,0.0,0.0,0.0,0.003486,0.001775,0.0,0.0,0.0,0.0,0.003578,0.0,0.003645,0.0,0.003096,0.003635,0.00184,0.021757,0.005455,0.0,0.0,0.0,0.0,0.0,0.0,0.014283,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.005947,0.0,0.0,0.0,0.010273,0.0,0.0,0.263759,0.0,0.0,0.0,0.00655,0.021869,0.0,0.0,0.002279,0.0,0.0,0.0,0.0,0.0,0.0,0.003462,0.0,0.012092,0.0,0.0,0.0,0.002863,0.006502,0.0,0.0,0.0,0.0,0.002739,0.015687,0.0,0.0,0.0,0.0,0.003859,0.0,0.09749,0.0,0.0,0.0,0.001927,0.0,0.0,0.037991,0.0,0.0,0.003259,0.0,0.0,0.0,0.009572,0.007075,0.031321,0.0,0.00469,0.0,0.0,0.009878,0.0,0.008006,0.0,0.0,0.0,0.016176,0.0138,0.003052,0.0,0.008099,0.0,0.0,0.0,0.0,0.015291,0.00342,0.003719,0.021804,0.001275,0.0,0.003672,0.0,0.0,0.0,0.008996,0.00384,0.0,0.0,0.0,0.048532,0.0,0.003027,0.00774,0.0,0.023139,0.018724,0.0,0.0,0.0,0.0,0.007014,0.0,0.0,0.003661,0.0,0.0,0.0,0.033444,0.0,0.0,0.0,0.002258,0.0,0.0,0.0,0.0,0.002258,0.003847,0.0,0.0,0.0,0.011282,0.0,0.0
3,53847,whichard,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0.124379,0.123155,0.105457,0.430596,0.415121,0.0,0.004179,0.009676,0.004278,0.0,0.0,0.0,...,0.0,0.0,0.036542,0.003214,0.0,0.010945,0.012025,0.0,0.0,0.0,0.0,0.0,0.255585,0.0,0.0,0.0191,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.010587,0.0,0.0,0.0,0.0,0.0,0.005815,0.00474,0.0,0.005166,0.0,0.113281,0.0,0.003998,0.0,0.0,0.003225,0.0,0.0,0.0,0.004892,0.005622,0.0,0.009206,0.0,0.0,0.0,0.013268,0.0,0.039956,0.0,0.0,0.0,0.0,0.004241,0.0,0.011546,0.006353,0.014575,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.005655,0.0,0.0,0.0,0.003897,0.0,0.0,0.0,0.0,0.0,0.002949,0.0,0.0,0.0,0.00591,0.0,0.023181,0.003029,0.0,0.008575,0.006041,0.003058,0.012053,0.0,0.0,0.004657,0.0,0.0,0.0,0.005719,0.004747,0.003981,0.0,0.0,0.033907,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.019701,0.0,0.0,0.012424,0.010885,0.009086,0.0,0.0,0.011364,0.0,0.01177,0.0,0.0,0.0,0.0,0.023016,0.0,0.0,0.0,0.0,0.005918,0.0,0.005403,0.0,0.0,0.0,0.0,0.004553,0.005214,0.0,0.0,0.004234,0.00588,0.0,0.0,0.07169,0.0,0.0,0.0,0.0,0.0,0.0,0.054831,0.005187,0.0,0.0,0.019312,0.008513,0.0,0.015908,0.007838,0.03718,0.0,0.018186,0.0,0.006259,0.016416,0.0,0.013305,0.0,0.0,0.0,0.035154,0.033637,0.017751,0.00939,0.017946,0.0,0.0,0.0,0.0,0.0,0.011368,0.0,0.0,0.012718,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.033211,0.0,0.005031,0.019294,0.0,0.0,0.00389,0.017047,0.0,0.00468,0.0,0.0,0.0,0.00645,0.0,0.0,0.0,0.0,0.014822,0.006043,0.0,0.0,0.0,0.004966,0.0,0.0,0.0,0.003753,0.012786,0.004921,0.0,0.0,0.0375,0.005292,0.004544
4,53848,webb,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0.027348,0.034133,0.081419,0.473283,0.415155,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.009283,0.0,0.0,0.008342,0.0,0.0,0.0,0.0,0.0,0.0,0.123679,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.263985,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.035646,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.014473,0.012138,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.016592,0.01385,0.0,0.0,0.0,0.0,0.0,0.018191,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.034969,0.0,0.0,0.0,0.0,0.0,0.0,0.096243,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.011335,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.012608,0.03729,0.0,0.0,0.0,0.011448,0.0,0.0,0.0,0.012912,0.0,0.0,0.0,0.025847,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.014464,0.0,0.007669,0.019607,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.016945,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0


In [7]:
# divide the data into label and features for use in ml models
X = df_ml.iloc[:, 2:]
y = df_ml.loc[:, 'judge']

# split the data into training and test sets
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X, y, random_state=6, stratify=y)

### Logistic Regression

In [8]:
from sklearn.linear_model import LogisticRegression
lr = LogisticRegression(random_state=6)
lr.fit(X_train, y_train)
y_pred_lr = lr.predict(X_test)
print('Accuracy on training set = {}'.format(lr.score(X_train, y_train)))
print('Accuracy on test set = {}'.format(lr.score(X_test, y_test)))



Accuracy on training set = 0.7363649873681082
Accuracy on test set = 0.6236378046364177


### Stochastic Gradient Descent Classifier

In [9]:
from sklearn.linear_model import SGDClassifier
sgdc = SGDClassifier(random_state=6)
sgdc.fit(X_train, y_train)
y_pred_sgdc = sgdc.predict(X_test)
print('Accuracy on training set = {}'.format(sgdc.score(X_train, y_train)))
print('Accuracy on test set = {}'.format(sgdc.score(X_test, y_test)))

Accuracy on training set = 0.7165007182840442
Accuracy on test set = 0.5903011690112938


### Linear Support Vector Machine Classifier

In [10]:
from sklearn.svm import LinearSVC
svc = LinearSVC(random_state=6)
svc.fit(X_train, y_train)
y_pred_svc = svc.predict(X_test)
print('Accuracy on training set = {}'.format(svc.score(X_train, y_train)))
print('Accuracy on test set = {}'.format(svc.score(X_test, y_test)))

Accuracy on training set = 0.9654563167715197
Accuracy on test set = 0.7418763621953636


### Summary

Here are the results:

| Model               | Train  | Test   |
|---------------------|--------|--------|
| Linear SVM          | 0.9655 | 0.7419 |
| Logistic Regression | 0.7364 | 0.6236 |
| SGD                 | 0.7165 | 0.5903 |

Combining the features improved the accuracy of the Linear SVM and Logistic Regression models significantly.  Both models are producing favorable accuracy percentages considering that the model has to identify the correct author from a pool of over 150 judges.  The SGD Classifier did not improve as significantly as the other two models and is now producing the lowest accuracy, so I have elected to move forward without it.

## Model Hyperparameter Tuning

Now that I have selected the two best-performing classifiers, I will do some model tuning to see if that further improves their performance.

### Logistic Regression

First, I will try the `newton-cg` solver to see if it will increase model performance over the default `lbfgs` solver:

In [None]:
from sklearn.linear_model import LogisticRegression
solver_list = ['newton-cg']
for solver in solver_list:
    lr = LogisticRegression(solver=solver, multi_class='auto')
    lr.fit(X_train, y_train)
    y_pred_lr = lr.predict(X_test)
    print('Solver value {} Training Set Accuracy: {}'.format(solver, lr.score(X_train, y_train)))
    print('Solver value {} Test Set Accuracy: {}'.format(solver, lr.score(X_test, y_test)))

Solver value newton-cg Training Set Accuracy: 0.7517214048645168
Solver value newton-cg Test Set Accuracy: 0.6322072518327719


The `newton-cg` solver produced the same accuracy results, so I will stick with the recommended default `lbfgs` solver.  Next, I will tune the C parameter, which controls the regularization strength of the model:

In [8]:
from sklearn.linear_model import LogisticRegression
c_list = [0.01, 0.1, 1, 10, 100, 1000, 10000]
for c in c_list:
    lr = LogisticRegression(C=c, solver='lbfgs', multi_class='auto', max_iter=1000)
    lr.fit(X_train, y_train)
    y_pred_lr = lr.predict(X_test)
    print('C value {} Training Set Accuracy: {}'.format(c, lr.score(X_train, y_train)))
    print('C value {} Test Set Accuracy: {}'.format(c, lr.score(X_test, y_test)))

C value 0.01 Training Set Accuracy: 0.13835636795957795
C value 0.01 Test Set Accuracy: 0.13552605508222706
C value 0.1 Training Set Accuracy: 0.41714965076534405
C value 0.1 Test Set Accuracy: 0.37859124232217156
C value 1 Training Set Accuracy: 0.7516718680338832
C value 1 Test Set Accuracy: 0.6322072518327719
C value 10 Training Set Accuracy: 0.9649609484651839
C value 10 Test Set Accuracy: 0.7231523677432138




C value 100 Training Set Accuracy: 0.9997027790161985
C value 100 Test Set Accuracy: 0.7294927679809788




C value 1000 Training Set Accuracy: 0.9999009263387328
C value 1000 Test Set Accuracy: 0.7243907271646522




C value 10000 Training Set Accuracy: 0.9999009263387328
C value 10000 Test Set Accuracy: 0.720229839508619


Here is a summary of the results of our hyperparameter tuning for the Logistic Regression classifier:

| Model               | Solver    | C Value | Train  | Test   |
|---------------------|-----------|---------|--------|--------|
| Logistic Regression | lbfgs     | 100     | 0.9998 | 0.7295 |
| Logistic Regression | lbfgs     | 1000    | 0.9999 | 0.7244 |
| Logistic Regression | lbfgs     | 10      | 0.9650 | 0.7232 |
| Logistic Regression | lbfgs     | 10000   | 0.9999 | 0.7202 |
| Logistic Regression | lbfgs     | 1       | 0.7517 | 0.6322 |
| Logistic Regression | newton-cg | 1       | 0.7517 | 0.6322 |
| Logistic Regression | lbfgs     | 0.1     | 0.4171 | 0.3786 |
| Logistic Regression | lbfgs     | 0.001   | 0.1384 | 0.1355 |

The tuning suggests that a C parameter value of 100 will result in the marginally best performance of the model.  I also will need to increase the number of iterations to higher than 1000 to allow the model to converge using that C parameter.

### Linear Support Vector Machine Classifier

For the Linear Support Vector Machine Classifier, I will tune the C parameter:

In [11]:
from sklearn.svm import LinearSVC
c_list = [0.01, 0.1, 1, 10, 100, 1000]
for c in c_list:
    svc = LinearSVC(C=c, max_iter=1500, random_state=6)
    svc.fit(X_train, y_train)
    y_pred_svc = svc.predict(X_test)
    print('C value {} Training Set Accuracy: {}'.format(c, svc.score(X_train, y_train)))
    print('C value {} Test Set Accuracy: {}'.format(c, svc.score(X_test, y_test)))

C value 0.01 Training Set Accuracy: 0.5204999917438615
C value 0.01 Test Set Accuracy: 0.456905092133941
C value 0.1 Training Set Accuracy: 0.8104885982728158
C value 0.1 Test Set Accuracy: 0.6641569249058847
C value 1 Training Set Accuracy: 0.9654563167715197
C value 1 Test Set Accuracy: 0.7418763621953636
C value 10 Training Set Accuracy: 0.9975561830220769
C value 10 Test Set Accuracy: 0.7231523677432138




C value 100 Training Set Accuracy: 0.9995706808011757
C value 100 Test Set Accuracy: 0.6945214979195562




C value 1000 Training Set Accuracy: 0.9995871930780535
C value 1000 Test Set Accuracy: 0.6842678819100456


It appears that the default C parameter value of 1 produced the highest accuracy for the Linear SVC model.

### Summary of Hyperparameter Tuning

Here is a summary of the results of the hyperparameter tuning:

| Model               | Solver    | C Value | Train  | Test   |
|---------------------|-----------|---------|--------|--------|
| Linear SVC          |           | 1       | 0.9655 | 0.7419 |
| Logistic Regression | lbfgs     | 100     | 0.9998 | 0.7295 |
| Logistic Regression | lbfgs     | 1000    | 0.9999 | 0.7244 |
| Linear SVC          |           | 10      | 0.9976 | 0.7232 |
| Logistic Regression | lbfgs     | 10      | 0.9650 | 0.7232 |
| Logistic Regression | lbfgs     | 10000   | 0.9999 | 0.7202 |
| Linear SVC          |           | 100     | 0.9996 | 0.6945 |
| Linear SVC          |           | 1000    | 0.9996 | 0.6843 |
| Linear SVC          |           | 0.1     | 0.8105 | 0.6642 |
| Logistic Regression | lbfgs     | 1       | 0.7517 | 0.6322 |
| Logistic Regression | newton-cg | 1       | 0.7517 | 0.6322 |
| Linear SVC          |           | 0.01    | 0.5205 | 0.4569 |
| Logistic Regression | lbfgs     | 0.1     | 0.4171 | 0.3786 |
| Logistic Regression | lbfgs     | 0.001   | 0.1384 | 0.1355 |

After tuning, the best Linear SVC model is generating an accuracy percentage of 74.19% on the test set and the best Logistic Regression Classifier is generating an accuracy percentage of 72.95%.  Both models are performing well at this point, especially considering that there are approximately 150 judges who authored opinions in the dataset.

As a final step, I will add some `SpaCy` word embeddings vectors for each opinion to see if they improve model performance.

### SpaCy Vectors

The `SpaCy` package converts documents into 300 dimension vectors that attempt to capture the substance of the text with numerical values.  I have previously created these vectors and saved them to a local file, so I will import that file and then add the vectors as additional features to the `df_ml` dataframe.  I have provided to code I used to create the vectors in the next cell (as comments because I do not need to run the code at this time).

In [None]:
# import and instantiate SpaCy
# import spacy
# nlp = spacy.load('en_core_web_md')

# create spacy vectors
# from tqdm import tqdm
# for row in tqdm(range(len(df_spacy))):
    #doc = nlp(df.loc[row, 'opinion'])
    #for i in range(300):
        #df_spacy.loc[row, 'spacy_{}'.format(i)] = doc.vector[i]

In [4]:
# import file
df_spacy = pd.read_csv('df_spacy.csv', index_col=0)

In [7]:
df_spacy.head()

Unnamed: 0,spacy_0,spacy_1,spacy_2,spacy_3,spacy_4,spacy_5,spacy_6,spacy_7,spacy_8,spacy_9,spacy_10,spacy_11,spacy_12,spacy_13,spacy_14,spacy_15,spacy_16,spacy_17,spacy_18,spacy_19,spacy_20,spacy_21,spacy_22,spacy_23,spacy_24,spacy_25,spacy_26,spacy_27,spacy_28,spacy_29,spacy_30,spacy_31,spacy_32,spacy_33,spacy_34,spacy_35,spacy_36,spacy_37,spacy_38,spacy_39,spacy_40,spacy_41,spacy_42,spacy_43,spacy_44,spacy_45,spacy_46,spacy_47,spacy_48,spacy_49,spacy_50,spacy_51,spacy_52,spacy_53,spacy_54,spacy_55,spacy_56,spacy_57,spacy_58,spacy_59,spacy_60,spacy_61,spacy_62,spacy_63,spacy_64,spacy_65,spacy_66,spacy_67,spacy_68,spacy_69,spacy_70,spacy_71,spacy_72,spacy_73,spacy_74,spacy_75,spacy_76,spacy_77,spacy_78,spacy_79,spacy_80,spacy_81,spacy_82,spacy_83,spacy_84,spacy_85,spacy_86,spacy_87,spacy_88,spacy_89,spacy_90,spacy_91,spacy_92,spacy_93,spacy_94,spacy_95,spacy_96,spacy_97,spacy_98,spacy_99,spacy_100,spacy_101,spacy_102,spacy_103,spacy_104,spacy_105,spacy_106,spacy_107,spacy_108,spacy_109,spacy_110,spacy_111,spacy_112,spacy_113,spacy_114,spacy_115,spacy_116,spacy_117,spacy_118,spacy_119,spacy_120,spacy_121,spacy_122,spacy_123,spacy_124,spacy_125,spacy_126,spacy_127,spacy_128,spacy_129,spacy_130,spacy_131,spacy_132,spacy_133,spacy_134,spacy_135,spacy_136,spacy_137,spacy_138,spacy_139,spacy_140,spacy_141,spacy_142,spacy_143,spacy_144,spacy_145,spacy_146,spacy_147,spacy_148,spacy_149,spacy_150,spacy_151,spacy_152,spacy_153,spacy_154,spacy_155,spacy_156,spacy_157,spacy_158,spacy_159,spacy_160,spacy_161,spacy_162,spacy_163,spacy_164,spacy_165,spacy_166,spacy_167,spacy_168,spacy_169,spacy_170,spacy_171,spacy_172,spacy_173,spacy_174,spacy_175,spacy_176,spacy_177,spacy_178,spacy_179,spacy_180,spacy_181,spacy_182,spacy_183,spacy_184,spacy_185,spacy_186,spacy_187,spacy_188,spacy_189,spacy_190,spacy_191,spacy_192,spacy_193,spacy_194,spacy_195,spacy_196,spacy_197,spacy_198,spacy_199,spacy_200,spacy_201,spacy_202,spacy_203,spacy_204,spacy_205,spacy_206,spacy_207,spacy_208,spacy_209,spacy_210,spacy_211,spacy_212,spacy_213,spacy_214,spacy_215,spacy_216,spacy_217,spacy_218,spacy_219,spacy_220,spacy_221,spacy_222,spacy_223,spacy_224,spacy_225,spacy_226,spacy_227,spacy_228,spacy_229,spacy_230,spacy_231,spacy_232,spacy_233,spacy_234,spacy_235,spacy_236,spacy_237,spacy_238,spacy_239,spacy_240,spacy_241,spacy_242,spacy_243,spacy_244,spacy_245,spacy_246,spacy_247,spacy_248,spacy_249,spacy_250,spacy_251,spacy_252,spacy_253,spacy_254,spacy_255,spacy_256,spacy_257,spacy_258,spacy_259,spacy_260,spacy_261,spacy_262,spacy_263,spacy_264,spacy_265,spacy_266,spacy_267,spacy_268,spacy_269,spacy_270,spacy_271,spacy_272,spacy_273,spacy_274,spacy_275,spacy_276,spacy_277,spacy_278,spacy_279,spacy_280,spacy_281,spacy_282,spacy_283,spacy_284,spacy_285,spacy_286,spacy_287,spacy_288,spacy_289,spacy_290,spacy_291,spacy_292,spacy_293,spacy_294,spacy_295,spacy_296,spacy_297,spacy_298,spacy_299
0,-0.018235,0.141126,-0.077766,-0.04114,-0.001164,-0.01462,0.032196,-0.082413,0.02477,2.116468,-0.192804,0.049729,0.141365,-0.016655,-0.169243,-0.062276,-0.033327,1.070306,-0.115193,-0.033005,0.031212,-0.071661,-0.031958,-0.008337,0.06575,0.043408,-0.052442,-0.077462,-0.011728,-0.059205,-0.040694,0.066518,-0.024291,0.079856,0.069881,-0.082784,-0.011462,-0.00015,-0.071227,-0.070347,-0.024128,0.029422,0.077631,-0.088816,-0.031311,-0.014032,-0.116316,0.063228,-0.001043,0.004374,-0.039147,0.051738,-0.047044,0.039755,-0.034598,0.012541,0.026246,-0.030908,0.002282,-0.06207,-0.011083,-0.026285,0.018029,0.182286,0.006683,-0.005607,0.001397,0.007796,-0.047747,0.035894,0.072234,0.042888,0.101939,-0.074478,0.132518,-0.080515,0.009667,-0.016727,-0.011491,0.109113,0.003445,0.073345,-0.115607,0.012601,0.034442,-0.132139,-0.03127,-0.150163,0.177718,0.060418,-0.068316,-0.028899,0.01097,-0.02421,0.11633,-0.069125,-0.071684,-0.03682,0.014122,0.003569,0.067058,0.043579,-0.04493,0.038197,0.051132,-0.695979,0.049199,0.030121,0.03353,0.001616,0.020328,0.041632,-0.005094,-0.095717,-0.00956,0.008061,0.01275,-0.036398,0.017365,-0.041218,0.07328,0.017533,-0.065662,-0.016818,-0.097047,0.11311,-0.002049,-0.053895,0.038419,0.035857,0.016007,-0.033766,-0.04511,-0.171848,0.015863,0.081987,0.00944,0.02429,0.094196,-0.009503,-0.498775,0.034639,0.086369,-0.02346,0.048467,0.006526,-0.010209,-0.036804,0.015696,-0.037286,-0.032206,-0.074124,0.047195,-0.001017,-0.001828,0.027171,-0.137168,-0.051203,-0.012183,0.03994,-0.045255,0.02572,-0.000703,-0.120854,-0.26731,-0.050583,0.083778,-0.051348,0.134112,0.057694,0.069704,-0.06111,0.085859,-0.005063,-0.018875,0.041786,0.047348,-0.008087,-0.024848,-0.01007,-0.060673,-0.034091,-0.052059,-0.052594,-0.021695,0.032273,-0.035809,0.022482,0.030253,0.059459,-0.064134,-0.011502,-0.080599,0.06867,0.057664,0.150732,-0.090631,-0.041987,-0.090741,0.151637,0.043687,-0.01219,-0.095211,0.068842,0.131725,-0.0138,0.0688,0.0465,-0.018699,0.088576,0.047676,0.07035,0.013252,-0.023716,0.036223,0.16822,-0.120535,0.055702,-0.144234,0.112092,-0.007428,-0.03352,-0.103483,0.002269,0.022093,-0.009108,-0.042688,0.040754,0.072245,-0.044951,-0.085309,0.055338,0.011751,0.023058,-0.103317,-0.072382,-0.00908,-0.04152,-0.017556,0.031214,0.051343,0.005803,0.122567,0.052368,0.085687,-0.083803,0.045434,-0.086136,-0.203029,-0.039481,0.050391,0.040936,-0.031557,-0.04773,-0.034486,0.285587,0.027503,-0.061577,0.058039,0.021573,0.043425,0.102344,-0.00266,0.104476,-0.015183,-0.086215,0.027608,-0.022471,0.376714,-0.168737,0.154059,-0.018269,-0.070891,-0.050475,0.063404,-0.00345,-0.042898,0.087198,0.033663,0.080465,0.214385,0.019007,0.08281,-0.061239,-0.073577,-0.032679,-0.0201,0.040057,0.141821,-0.091405,-0.170022,0.025481,0.010934,-0.056806,-0.095657,-0.03897,0.027272,-0.053127,0.03063,0.037495
1,-0.084114,0.211868,-0.025314,-0.079775,-0.034274,-0.037462,0.057598,-0.048959,0.001335,2.026258,-0.278814,0.09032,0.200667,-0.00733,-0.113698,-0.018429,-0.072254,0.934032,-0.140462,0.001633,0.021376,-0.091945,-0.02791,-0.026018,0.147895,0.093733,-0.105549,0.026241,0.033703,-0.02582,-0.028645,-0.020044,-0.047317,0.083274,0.011339,-0.069034,0.0456,-0.070799,-0.092307,-0.077901,-0.038027,0.102327,0.140531,-0.113886,-0.010942,-0.007901,-0.2456,0.06728,-0.027769,-0.072281,-0.010517,0.017179,0.090949,-0.052786,0.036153,-0.022454,-0.057134,-0.037015,-0.03221,-0.128844,-0.121354,-0.028959,0.098923,0.14784,0.029785,-0.047065,-0.031861,-0.039397,-0.00383,0.079513,0.034625,0.003925,0.170912,-0.050131,0.123868,-0.090414,-0.00226,-0.012856,-0.039969,0.170699,-0.080648,0.089103,-0.10032,0.047108,0.042894,-0.116428,0.087725,-0.179177,0.262185,-0.012099,-0.122376,0.026406,-0.035735,0.031726,0.105567,-0.037724,-0.050171,-0.092082,-0.005992,0.062664,0.033188,0.061057,-0.019051,0.015338,0.213518,-0.664823,0.059445,0.00167,0.032277,0.002308,0.073126,-0.078871,0.022623,-0.127455,-0.030868,0.011168,0.041871,-0.000105,0.051221,0.001958,0.081083,-0.059734,-0.08169,0.019776,-0.056785,0.10357,-0.092216,-0.080237,0.089439,-0.031194,0.037906,-0.107573,-0.080929,-0.126018,0.05355,0.118354,0.011544,-0.021809,0.130848,0.052688,-0.648721,0.046161,0.129555,0.001348,-0.031054,0.032704,-0.099353,-0.001366,-0.003421,-0.020984,0.027089,0.001914,0.109029,-0.003455,0.017981,-0.05765,-0.23318,-0.05308,0.093887,-0.013925,-0.013483,0.024041,-0.096555,-0.119598,-0.194267,-0.086176,0.087932,-0.06532,0.143295,0.084089,0.02234,-0.049313,0.153604,-0.036988,0.031356,0.0493,0.046345,-0.022468,-0.073427,-0.079094,-0.077475,0.027664,-0.020301,-0.08477,0.016429,0.105207,-0.058117,-0.076781,0.12318,0.086554,-0.139641,-0.001632,-0.013196,0.081203,0.075485,0.15877,-0.089158,-0.022185,-0.008717,0.161925,0.093597,-0.016625,-0.115769,0.117596,0.096778,-0.028697,0.041762,0.018658,-0.01506,0.1544,0.054307,0.219166,0.029025,-0.021557,0.086271,0.17413,-0.129485,0.066489,-0.107975,0.205714,0.002761,-0.071543,-0.088634,-0.001215,0.001767,0.001245,-0.101163,0.033922,0.027704,-0.10973,-0.096139,-0.007607,-0.006592,0.079269,-0.160609,-0.041267,-0.016817,0.002495,-0.075645,-0.013255,0.008271,0.010108,0.16643,0.135628,0.15596,-0.109343,0.043525,-0.000827,-0.258725,-0.001903,0.058186,0.063251,-0.05545,-0.045338,-0.058768,0.249276,0.134198,-0.115033,0.058015,-0.004699,0.047971,0.11246,-0.033943,0.067519,-0.018642,-0.077774,-0.013301,0.051835,0.36024,-0.081716,0.14131,0.044711,0.00165,-0.078118,-0.054228,0.116894,0.038527,0.087647,-0.045693,0.086511,0.182788,-0.059139,0.020093,-0.109563,-0.082805,-0.048847,-0.020014,0.071146,0.116561,-0.02331,-0.075012,0.006316,-0.033711,-0.152365,-0.04406,0.051489,-0.057511,-0.023146,-0.061747,0.057839
2,-0.031218,0.177579,-0.110459,-0.015997,0.02276,-0.023197,-0.004636,-0.089868,-0.021976,2.266324,-0.141792,0.060227,0.134374,-0.010603,-0.211327,-0.059041,-0.051099,0.932276,-0.130797,-0.049704,0.02015,-0.078448,-0.034943,0.004263,0.073384,0.021532,-0.054368,-0.055199,0.000638,-0.090853,-0.036661,0.074017,-0.053181,0.084578,0.053516,-0.104696,-0.019632,0.03372,-0.078516,-0.07538,0.01285,0.014253,0.082791,-0.115984,-0.003837,-0.000222,-0.121734,-0.011534,0.041311,0.031829,-0.06094,0.06446,-0.076666,0.046999,0.005713,0.022953,-0.013933,-0.049501,-0.015271,-0.072222,-0.025141,-0.024457,0.016111,0.16968,0.003479,-0.030589,-0.001778,0.0258,-0.007524,0.05841,0.05078,0.035797,0.105161,-0.045069,0.08148,-0.073623,0.010691,-0.064058,-0.041016,0.098254,-0.022511,0.057546,-0.138505,-0.047673,0.037625,-0.157268,0.021586,-0.229458,0.154981,0.088714,-0.099382,-0.002255,-0.006972,-0.022632,0.086881,-0.093008,-0.033157,-0.038988,0.010066,-0.037121,0.03456,0.07044,-0.01857,-0.006347,0.041707,-0.640401,0.06061,0.036865,0.021174,0.009882,0.002514,0.029868,0.013932,-0.063205,-0.033663,-0.005915,0.018332,-0.040471,0.047245,-0.016643,0.059553,-0.000439,-0.019837,-0.039426,-0.061895,0.080764,-0.069487,-0.127133,0.003018,0.005814,0.00184,-0.015001,-0.052456,-0.095933,0.056237,0.051564,0.011592,0.058071,0.050523,-0.046882,-0.78397,0.032132,0.077227,0.023504,0.001901,0.019654,-0.035744,-0.044438,0.016323,-0.068792,-0.032523,-0.034418,0.058239,-0.008238,-0.022087,0.003257,-0.134932,-0.049582,0.003896,-0.03161,-0.08318,0.009489,0.023732,-0.069814,-0.195869,-0.077847,0.052062,-0.023179,0.148662,0.060811,0.046415,-0.070887,0.100646,-0.017159,-0.070451,0.029142,0.023766,-0.018076,-0.067035,-0.045778,-0.036065,-0.014255,-0.052263,-0.011488,-0.052839,0.101561,-0.034346,0.030621,0.053491,0.047591,-0.059923,-0.003915,-0.102859,0.119867,0.054198,0.139213,-0.066201,-0.050906,-0.080693,0.122512,-0.017107,-0.0336,-0.110105,0.053379,0.105198,-0.00595,0.085929,0.039644,0.000688,0.090341,0.02896,0.017103,0.007929,-0.049879,0.098527,0.120339,-0.118549,0.037062,-0.140811,0.065332,0.036056,-0.028158,-0.107282,0.017783,-0.01457,-0.003655,-0.000879,0.050351,0.033302,-0.032374,-0.102659,0.030128,0.034105,0.048491,-0.153407,-0.085209,-0.03027,-0.005027,-0.06229,0.060101,0.054786,0.034528,0.061647,0.054063,0.132618,-0.074306,0.022347,-0.072705,-0.181028,0.031826,0.03823,0.026587,-0.01032,-0.037007,-0.021394,0.219941,0.053071,-0.05283,0.081708,0.038509,0.079903,0.132684,-0.008807,0.097421,0.018645,-0.02001,0.025977,0.041668,0.269692,-0.162351,0.139519,-0.000273,-0.047997,-0.082528,0.009726,0.006002,-0.034518,0.044055,-0.0027,0.101965,0.203405,0.066413,0.034277,-0.091939,-0.058589,-0.012907,-0.002531,0.020879,0.115097,-0.06455,-0.153296,0.068581,0.009178,-0.043292,-0.057231,-0.047207,0.005839,-0.020763,0.026576,0.027186
3,-0.021508,0.177556,-0.105615,-0.043884,0.002879,-0.031908,-0.003519,-0.060925,-0.005755,2.215132,-0.139976,0.065458,0.137746,-0.002435,-0.207878,-0.073383,-0.055856,0.981315,-0.115957,-0.038079,-0.014753,-0.10455,-0.028625,0.008731,0.059671,0.007517,-0.041222,-0.067193,0.021957,-0.082004,-0.056516,0.064191,-0.064978,0.065994,0.048644,-0.093732,0.000837,0.003621,-0.079783,-0.069343,-0.020788,0.009027,0.093332,-0.105704,-0.015653,-0.016875,-0.121958,0.023353,0.019103,0.024854,-0.072786,0.072361,-0.057798,0.044385,0.005329,0.039805,0.003755,-0.049763,-0.014534,-0.065294,-0.028938,-0.039785,0.019084,0.168974,-0.002473,-0.022186,0.000607,0.007956,-0.022222,0.036658,0.05704,0.045213,0.09935,-0.068385,0.094114,-0.092185,0.011696,-0.051768,-0.024148,0.118258,-0.019812,0.040053,-0.128707,-0.032905,0.049015,-0.165138,0.024866,-0.235502,0.162182,0.073743,-0.097449,-0.017836,0.003577,-0.012359,0.079411,-0.084616,-0.051392,-0.017189,0.003165,-0.001358,0.042378,0.075919,-0.020391,0.029846,0.052831,-0.694189,0.03069,0.036619,0.025247,0.026745,0.007509,0.018947,8e-06,-0.069484,-0.047695,-0.004812,0.018747,-0.045675,0.031603,-0.036465,0.073181,-0.008455,-0.024063,-0.031124,-0.072177,0.098977,-0.055151,-0.108403,0.027493,0.018115,0.03468,-0.008096,-0.032251,-0.121215,0.045242,0.057516,0.006834,0.029879,0.072126,-0.000507,-0.649408,0.050991,0.084494,0.010711,0.031926,0.041803,-0.023173,-0.058049,0.021118,-0.069627,-0.022949,-0.030748,0.065363,-0.006427,-0.009699,0.003394,-0.147467,-0.055736,0.011702,-0.024784,-0.070058,0.014468,0.006225,-0.07512,-0.227485,-0.044799,0.064127,-0.038268,0.14303,0.077574,0.051918,-0.072173,0.115477,-0.010826,-0.053075,0.021827,0.044148,-0.021992,-0.054211,-0.056665,-0.039903,-0.001474,-0.044066,0.000317,-0.046717,0.078336,-0.027381,0.023833,0.065877,0.059852,-0.065764,0.008798,-0.080418,0.115465,0.047966,0.119389,-0.084542,-0.072737,-0.084995,0.127433,0.003853,-0.033087,-0.128987,0.062119,0.100876,-0.015833,0.081038,0.036946,0.01225,0.100631,0.060061,0.024775,0.019398,-0.058147,0.111245,0.112658,-0.109085,0.062171,-0.141977,0.093174,0.023745,-0.025913,-0.085236,-0.005293,0.001742,-0.007652,-0.013145,0.022323,0.031024,-0.044444,-0.108637,0.038553,0.027583,0.057374,-0.140856,-0.070857,-0.013226,-0.009722,-0.054603,0.048142,0.077339,0.037193,0.062148,0.073296,0.122854,-0.048148,0.025761,-0.076465,-0.214707,0.036329,0.04141,0.043108,-0.017705,-0.029246,-0.057896,0.256242,0.058742,-0.054642,0.102896,0.038075,0.070529,0.119886,-0.022712,0.122509,-0.012185,-0.014318,0.031215,0.041633,0.313441,-0.153315,0.130531,-0.017005,-0.022649,-0.062804,0.011829,0.007399,-0.035983,0.060313,-0.002801,0.095373,0.214745,0.060337,0.059819,-0.102618,-0.04783,-0.020303,-0.007884,0.008755,0.127773,-0.081473,-0.165347,0.048698,-0.018583,-0.053634,-0.090086,-0.031005,0.010797,-0.053232,0.025511,0.015504
4,0.017297,0.15921,-0.056437,-0.06493,0.023885,-0.034701,-0.00085,-0.021355,0.030055,2.110572,-0.157416,0.069915,0.146728,-0.033003,-0.175534,-0.064603,-0.04485,1.090652,-0.106103,-0.032908,0.012718,-0.080671,-0.024293,0.006308,0.020825,0.027461,-0.085772,-0.059748,-0.065279,-0.025525,-0.095215,0.093233,-0.03415,0.087134,0.057596,-0.064533,-0.012003,0.007128,-0.093714,-0.067563,-0.05692,0.051694,0.097947,-0.067258,0.007046,-0.003752,-0.070797,0.017701,0.006285,0.037754,-0.030818,0.042224,-0.0682,0.047317,-0.013456,-0.002436,0.051357,-0.026557,-0.023588,-0.084956,-0.045292,0.001749,-0.054088,0.217444,-0.017611,-0.016721,-0.00358,0.024051,-0.060793,0.024541,0.035723,0.061084,0.09692,-0.069698,0.115089,-0.040888,0.028786,0.010713,-0.019572,0.144349,0.001403,0.088464,-0.117274,-0.015994,0.067196,-0.153198,0.03904,-0.172157,0.202833,0.039937,-0.082024,-0.026367,0.000133,-0.002983,0.153868,-0.056193,-0.079375,-0.044805,0.053958,0.000641,0.073703,0.021598,-0.089965,0.044783,0.039556,-0.719492,0.068516,0.042818,-0.015109,-0.014993,0.007027,-0.005757,-0.019476,-0.142222,-0.041598,0.000721,0.016406,-0.02601,-0.013725,-0.065669,0.057837,0.002039,-0.056605,-0.00124,-0.122238,0.130511,-0.001739,-0.105672,0.057378,0.028701,-0.006559,-0.021915,-0.065336,-0.131008,0.031762,0.066498,-0.012643,0.030093,0.056889,-0.031471,-0.567017,0.076722,0.088513,0.015749,0.065081,0.011606,-0.001211,-0.026897,0.052967,-0.057591,-0.04377,-0.069039,0.043835,0.007552,0.00414,-0.02224,-0.155387,-0.041389,-0.071231,0.072479,-0.018273,0.037388,-0.02534,-0.066788,-0.242339,-0.075602,0.112722,-0.025939,0.150463,0.082462,0.110177,-0.057782,0.080766,-0.057162,-0.022981,0.042826,0.028092,-0.00752,0.001575,-0.078498,-0.061032,-0.027239,-0.048714,-0.079386,-0.083205,0.039233,-0.055597,-0.026394,0.092426,0.091075,-0.050744,0.013062,-0.098073,0.055112,0.037736,0.149963,-0.086944,-0.06095,-0.097339,0.145765,0.024702,0.019933,-0.098721,0.059771,0.15749,0.028889,0.053986,0.033632,-0.006499,0.06852,0.070191,0.088932,0.001317,-0.038975,0.068242,0.190494,-0.150237,0.079239,-0.132378,0.129473,0.00318,-0.036908,-0.080395,0.006685,0.022435,-0.02196,-0.007403,0.072116,0.045969,-0.049651,-0.094487,0.044483,0.014615,0.04955,-0.121931,-0.040323,0.012505,-0.039559,-0.063777,0.0482,0.043387,0.008492,0.123738,0.126284,0.076018,-0.117951,0.020633,-0.100493,-0.197809,0.00717,0.052671,0.042319,-0.011268,-0.074073,-0.061237,0.322527,0.064988,-0.024337,0.072037,0.038409,0.068092,0.100443,-0.015362,0.148342,0.029396,-0.05152,0.011762,-0.038353,0.393049,-0.147386,0.151958,-0.016099,-0.054457,-0.069576,0.025679,0.004654,-0.050701,0.093574,0.041355,0.077618,0.260492,0.016556,0.059116,-0.057799,-0.070048,-0.055054,0.021978,-0.012645,0.161127,-0.089531,-0.201259,0.026744,-0.008826,-0.08457,-0.090231,-0.018201,-0.012622,-0.065629,0.02271,0.022754


In [6]:
del df_spacy['id']
del df_spacy['judge']

In [8]:
df_ml.head()

Unnamed: 0,id,judge,type_concurrence,type_concurring-in-part-and-dissenting-in-part,type_dissent,type_majority,type_rehearing,year_1779,year_1784,year_1787,year_1789,year_1790,year_1791,year_1792,year_1793,year_1794,year_1795,year_1796,year_1797,year_1798,year_1799,year_1800,year_1801,year_1802,year_1803,year_1804,year_1805,year_1806,year_1807,year_1808,year_1809,year_1810,year_1811,year_1812,year_1813,year_1814,year_1815,year_1816,year_1817,year_1818,year_1819,year_1820,year_1821,year_1822,year_1823,year_1824,year_1825,year_1826,year_1827,year_1828,year_1829,year_1830,year_1831,year_1832,year_1833,year_1834,year_1835,year_1836,year_1837,year_1838,year_1839,year_1840,year_1841,year_1842,year_1843,year_1844,year_1845,year_1846,year_1847,year_1848,year_1849,year_1850,year_1851,year_1852,year_1853,year_1854,year_1855,year_1856,year_1857,year_1858,year_1859,year_1860,year_1861,year_1862,year_1863,year_1864,year_1866,year_1867,year_1868,year_1869,year_1870,year_1871,year_1872,year_1873,year_1874,year_1875,year_1876,year_1877,year_1878,year_1879,year_1880,year_1881,year_1882,year_1883,year_1884,year_1885,year_1886,year_1887,year_1888,year_1889,year_1890,year_1891,year_1892,year_1893,year_1894,year_1895,year_1896,year_1897,year_1898,year_1899,year_1900,year_1901,year_1902,year_1903,year_1904,year_1905,year_1906,year_1907,year_1908,year_1909,year_1910,year_1911,year_1912,year_1913,year_1914,year_1915,year_1916,year_1917,year_1918,year_1919,year_1920,year_1921,year_1922,year_1923,year_1924,year_1925,year_1926,year_1927,year_1928,year_1929,year_1930,year_1931,year_1932,year_1933,year_1934,year_1935,year_1936,year_1937,year_1938,year_1939,year_1940,year_1941,year_1942,year_1943,year_1944,year_1945,year_1946,year_1947,year_1948,year_1949,year_1950,year_1951,year_1952,year_1953,year_1954,year_1955,year_1956,year_1957,year_1958,year_1959,year_1960,year_1961,year_1962,year_1963,year_1964,year_1965,year_1966,year_1967,year_1968,year_1969,year_1970,year_1971,year_1972,year_1973,year_1974,year_1975,year_1976,year_1977,year_1978,year_1979,year_1980,year_1981,year_1982,year_1983,year_1984,year_1985,year_1986,year_1987,year_1988,year_1989,year_1990,year_1991,year_1992,year_1993,year_1994,year_1995,year_1996,year_1997,year_1998,year_1999,year_2000,year_2001,year_2002,year_2003,year_2004,year_2005,year_2006,year_2007,year_2008,year_2009,year_2010,year_2011,year_2012,year_2013,year_2014,year_2015,year_2016,year_2017,word_count,sentence_count,avg_sent_length,polarity,subjectivity
0,53839,frye,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0.088804,0.098247,0.093527,0.501317,0.473721
1,53841,per_curiam,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0.000265,0.000923,0.018977,0.444444,0.0
2,53843,parker_sarah,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0.192718,0.186347,0.108294,0.437718,0.321622
3,53847,whichard,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0.124379,0.123155,0.105457,0.430596,0.415121
4,53848,webb,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0.027348,0.034133,0.081419,0.473283,0.415155


In [9]:
# add the spacy word embedded vectors to the df_ml dataframe
df_ml = pd.concat([df_ml, df_tfidf, df_spacy], axis=1)
df_ml.head()

Unnamed: 0,id,judge,type_concurrence,type_concurring-in-part-and-dissenting-in-part,type_dissent,type_majority,type_rehearing,year_1779,year_1784,year_1787,year_1789,year_1790,year_1791,year_1792,year_1793,year_1794,year_1795,year_1796,year_1797,year_1798,year_1799,year_1800,year_1801,year_1802,year_1803,year_1804,year_1805,year_1806,year_1807,year_1808,year_1809,year_1810,year_1811,year_1812,year_1813,year_1814,year_1815,year_1816,year_1817,year_1818,year_1819,year_1820,year_1821,year_1822,year_1823,year_1824,year_1825,year_1826,year_1827,year_1828,year_1829,year_1830,year_1831,year_1832,year_1833,year_1834,year_1835,year_1836,year_1837,year_1838,year_1839,year_1840,year_1841,year_1842,year_1843,year_1844,year_1845,year_1846,year_1847,year_1848,year_1849,year_1850,year_1851,year_1852,year_1853,year_1854,year_1855,year_1856,year_1857,year_1858,year_1859,year_1860,year_1861,year_1862,year_1863,year_1864,year_1866,year_1867,year_1868,year_1869,year_1870,year_1871,year_1872,year_1873,year_1874,year_1875,year_1876,year_1877,year_1878,year_1879,year_1880,year_1881,year_1882,year_1883,year_1884,year_1885,year_1886,year_1887,year_1888,year_1889,year_1890,year_1891,year_1892,year_1893,year_1894,year_1895,year_1896,year_1897,year_1898,year_1899,year_1900,year_1901,year_1902,year_1903,year_1904,year_1905,year_1906,year_1907,year_1908,year_1909,year_1910,year_1911,year_1912,year_1913,year_1914,year_1915,year_1916,year_1917,year_1918,year_1919,year_1920,year_1921,year_1922,year_1923,year_1924,year_1925,year_1926,year_1927,year_1928,year_1929,year_1930,year_1931,year_1932,year_1933,year_1934,year_1935,year_1936,year_1937,year_1938,year_1939,year_1940,year_1941,year_1942,year_1943,year_1944,year_1945,year_1946,year_1947,year_1948,year_1949,year_1950,year_1951,year_1952,year_1953,year_1954,year_1955,year_1956,year_1957,year_1958,year_1959,year_1960,year_1961,year_1962,year_1963,year_1964,year_1965,year_1966,year_1967,year_1968,year_1969,year_1970,year_1971,year_1972,year_1973,year_1974,year_1975,year_1976,year_1977,year_1978,year_1979,year_1980,year_1981,year_1982,year_1983,year_1984,year_1985,year_1986,year_1987,year_1988,year_1989,year_1990,year_1991,year_1992,year_1993,year_1994,year_1995,year_1996,year_1997,year_1998,year_1999,year_2000,year_2001,year_2002,year_2003,year_2004,year_2005,year_2006,year_2007,year_2008,year_2009,year_2010,year_2011,year_2012,year_2013,year_2014,year_2015,year_2016,year_2017,word_count,sentence_count,avg_sent_length,polarity,subjectivity,00,000,10,100,101,102,103,...,spacy_50,spacy_51,spacy_52,spacy_53,spacy_54,spacy_55,spacy_56,spacy_57,spacy_58,spacy_59,spacy_60,spacy_61,spacy_62,spacy_63,spacy_64,spacy_65,spacy_66,spacy_67,spacy_68,spacy_69,spacy_70,spacy_71,spacy_72,spacy_73,spacy_74,spacy_75,spacy_76,spacy_77,spacy_78,spacy_79,spacy_80,spacy_81,spacy_82,spacy_83,spacy_84,spacy_85,spacy_86,spacy_87,spacy_88,spacy_89,spacy_90,spacy_91,spacy_92,spacy_93,spacy_94,spacy_95,spacy_96,spacy_97,spacy_98,spacy_99,spacy_100,spacy_101,spacy_102,spacy_103,spacy_104,spacy_105,spacy_106,spacy_107,spacy_108,spacy_109,spacy_110,spacy_111,spacy_112,spacy_113,spacy_114,spacy_115,spacy_116,spacy_117,spacy_118,spacy_119,spacy_120,spacy_121,spacy_122,spacy_123,spacy_124,spacy_125,spacy_126,spacy_127,spacy_128,spacy_129,spacy_130,spacy_131,spacy_132,spacy_133,spacy_134,spacy_135,spacy_136,spacy_137,spacy_138,spacy_139,spacy_140,spacy_141,spacy_142,spacy_143,spacy_144,spacy_145,spacy_146,spacy_147,spacy_148,spacy_149,spacy_150,spacy_151,spacy_152,spacy_153,spacy_154,spacy_155,spacy_156,spacy_157,spacy_158,spacy_159,spacy_160,spacy_161,spacy_162,spacy_163,spacy_164,spacy_165,spacy_166,spacy_167,spacy_168,spacy_169,spacy_170,spacy_171,spacy_172,spacy_173,spacy_174,spacy_175,spacy_176,spacy_177,spacy_178,spacy_179,spacy_180,spacy_181,spacy_182,spacy_183,spacy_184,spacy_185,spacy_186,spacy_187,spacy_188,spacy_189,spacy_190,spacy_191,spacy_192,spacy_193,spacy_194,spacy_195,spacy_196,spacy_197,spacy_198,spacy_199,spacy_200,spacy_201,spacy_202,spacy_203,spacy_204,spacy_205,spacy_206,spacy_207,spacy_208,spacy_209,spacy_210,spacy_211,spacy_212,spacy_213,spacy_214,spacy_215,spacy_216,spacy_217,spacy_218,spacy_219,spacy_220,spacy_221,spacy_222,spacy_223,spacy_224,spacy_225,spacy_226,spacy_227,spacy_228,spacy_229,spacy_230,spacy_231,spacy_232,spacy_233,spacy_234,spacy_235,spacy_236,spacy_237,spacy_238,spacy_239,spacy_240,spacy_241,spacy_242,spacy_243,spacy_244,spacy_245,spacy_246,spacy_247,spacy_248,spacy_249,spacy_250,spacy_251,spacy_252,spacy_253,spacy_254,spacy_255,spacy_256,spacy_257,spacy_258,spacy_259,spacy_260,spacy_261,spacy_262,spacy_263,spacy_264,spacy_265,spacy_266,spacy_267,spacy_268,spacy_269,spacy_270,spacy_271,spacy_272,spacy_273,spacy_274,spacy_275,spacy_276,spacy_277,spacy_278,spacy_279,spacy_280,spacy_281,spacy_282,spacy_283,spacy_284,spacy_285,spacy_286,spacy_287,spacy_288,spacy_289,spacy_290,spacy_291,spacy_292,spacy_293,spacy_294,spacy_295,spacy_296,spacy_297,spacy_298,spacy_299
0,53839,frye,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0.088804,0.098247,0.093527,0.501317,0.473721,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,-0.039147,0.051738,-0.047044,0.039755,-0.034598,0.012541,0.026246,-0.030908,0.002282,-0.06207,-0.011083,-0.026285,0.018029,0.182286,0.006683,-0.005607,0.001397,0.007796,-0.047747,0.035894,0.072234,0.042888,0.101939,-0.074478,0.132518,-0.080515,0.009667,-0.016727,-0.011491,0.109113,0.003445,0.073345,-0.115607,0.012601,0.034442,-0.132139,-0.03127,-0.150163,0.177718,0.060418,-0.068316,-0.028899,0.01097,-0.02421,0.11633,-0.069125,-0.071684,-0.03682,0.014122,0.003569,0.067058,0.043579,-0.04493,0.038197,0.051132,-0.695979,0.049199,0.030121,0.03353,0.001616,0.020328,0.041632,-0.005094,-0.095717,-0.00956,0.008061,0.01275,-0.036398,0.017365,-0.041218,0.07328,0.017533,-0.065662,-0.016818,-0.097047,0.11311,-0.002049,-0.053895,0.038419,0.035857,0.016007,-0.033766,-0.04511,-0.171848,0.015863,0.081987,0.00944,0.02429,0.094196,-0.009503,-0.498775,0.034639,0.086369,-0.02346,0.048467,0.006526,-0.010209,-0.036804,0.015696,-0.037286,-0.032206,-0.074124,0.047195,-0.001017,-0.001828,0.027171,-0.137168,-0.051203,-0.012183,0.03994,-0.045255,0.02572,-0.000703,-0.120854,-0.26731,-0.050583,0.083778,-0.051348,0.134112,0.057694,0.069704,-0.06111,0.085859,-0.005063,-0.018875,0.041786,0.047348,-0.008087,-0.024848,-0.01007,-0.060673,-0.034091,-0.052059,-0.052594,-0.021695,0.032273,-0.035809,0.022482,0.030253,0.059459,-0.064134,-0.011502,-0.080599,0.06867,0.057664,0.150732,-0.090631,-0.041987,-0.090741,0.151637,0.043687,-0.01219,-0.095211,0.068842,0.131725,-0.0138,0.0688,0.0465,-0.018699,0.088576,0.047676,0.07035,0.013252,-0.023716,0.036223,0.16822,-0.120535,0.055702,-0.144234,0.112092,-0.007428,-0.03352,-0.103483,0.002269,0.022093,-0.009108,-0.042688,0.040754,0.072245,-0.044951,-0.085309,0.055338,0.011751,0.023058,-0.103317,-0.072382,-0.00908,-0.04152,-0.017556,0.031214,0.051343,0.005803,0.122567,0.052368,0.085687,-0.083803,0.045434,-0.086136,-0.203029,-0.039481,0.050391,0.040936,-0.031557,-0.04773,-0.034486,0.285587,0.027503,-0.061577,0.058039,0.021573,0.043425,0.102344,-0.00266,0.104476,-0.015183,-0.086215,0.027608,-0.022471,0.376714,-0.168737,0.154059,-0.018269,-0.070891,-0.050475,0.063404,-0.00345,-0.042898,0.087198,0.033663,0.080465,0.214385,0.019007,0.08281,-0.061239,-0.073577,-0.032679,-0.0201,0.040057,0.141821,-0.091405,-0.170022,0.025481,0.010934,-0.056806,-0.095657,-0.03897,0.027272,-0.053127,0.03063,0.037495
1,53841,per_curiam,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0.000265,0.000923,0.018977,0.444444,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,-0.010517,0.017179,0.090949,-0.052786,0.036153,-0.022454,-0.057134,-0.037015,-0.03221,-0.128844,-0.121354,-0.028959,0.098923,0.14784,0.029785,-0.047065,-0.031861,-0.039397,-0.00383,0.079513,0.034625,0.003925,0.170912,-0.050131,0.123868,-0.090414,-0.00226,-0.012856,-0.039969,0.170699,-0.080648,0.089103,-0.10032,0.047108,0.042894,-0.116428,0.087725,-0.179177,0.262185,-0.012099,-0.122376,0.026406,-0.035735,0.031726,0.105567,-0.037724,-0.050171,-0.092082,-0.005992,0.062664,0.033188,0.061057,-0.019051,0.015338,0.213518,-0.664823,0.059445,0.00167,0.032277,0.002308,0.073126,-0.078871,0.022623,-0.127455,-0.030868,0.011168,0.041871,-0.000105,0.051221,0.001958,0.081083,-0.059734,-0.08169,0.019776,-0.056785,0.10357,-0.092216,-0.080237,0.089439,-0.031194,0.037906,-0.107573,-0.080929,-0.126018,0.05355,0.118354,0.011544,-0.021809,0.130848,0.052688,-0.648721,0.046161,0.129555,0.001348,-0.031054,0.032704,-0.099353,-0.001366,-0.003421,-0.020984,0.027089,0.001914,0.109029,-0.003455,0.017981,-0.05765,-0.23318,-0.05308,0.093887,-0.013925,-0.013483,0.024041,-0.096555,-0.119598,-0.194267,-0.086176,0.087932,-0.06532,0.143295,0.084089,0.02234,-0.049313,0.153604,-0.036988,0.031356,0.0493,0.046345,-0.022468,-0.073427,-0.079094,-0.077475,0.027664,-0.020301,-0.08477,0.016429,0.105207,-0.058117,-0.076781,0.12318,0.086554,-0.139641,-0.001632,-0.013196,0.081203,0.075485,0.15877,-0.089158,-0.022185,-0.008717,0.161925,0.093597,-0.016625,-0.115769,0.117596,0.096778,-0.028697,0.041762,0.018658,-0.01506,0.1544,0.054307,0.219166,0.029025,-0.021557,0.086271,0.17413,-0.129485,0.066489,-0.107975,0.205714,0.002761,-0.071543,-0.088634,-0.001215,0.001767,0.001245,-0.101163,0.033922,0.027704,-0.10973,-0.096139,-0.007607,-0.006592,0.079269,-0.160609,-0.041267,-0.016817,0.002495,-0.075645,-0.013255,0.008271,0.010108,0.16643,0.135628,0.15596,-0.109343,0.043525,-0.000827,-0.258725,-0.001903,0.058186,0.063251,-0.05545,-0.045338,-0.058768,0.249276,0.134198,-0.115033,0.058015,-0.004699,0.047971,0.11246,-0.033943,0.067519,-0.018642,-0.077774,-0.013301,0.051835,0.36024,-0.081716,0.14131,0.044711,0.00165,-0.078118,-0.054228,0.116894,0.038527,0.087647,-0.045693,0.086511,0.182788,-0.059139,0.020093,-0.109563,-0.082805,-0.048847,-0.020014,0.071146,0.116561,-0.02331,-0.075012,0.006316,-0.033711,-0.152365,-0.04406,0.051489,-0.057511,-0.023146,-0.061747,0.057839
2,53843,parker_sarah,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0.192718,0.186347,0.108294,0.437718,0.321622,0.002442,0.017602,0.005822,0.0,0.0,0.0,0.003096,...,-0.06094,0.06446,-0.076666,0.046999,0.005713,0.022953,-0.013933,-0.049501,-0.015271,-0.072222,-0.025141,-0.024457,0.016111,0.16968,0.003479,-0.030589,-0.001778,0.0258,-0.007524,0.05841,0.05078,0.035797,0.105161,-0.045069,0.08148,-0.073623,0.010691,-0.064058,-0.041016,0.098254,-0.022511,0.057546,-0.138505,-0.047673,0.037625,-0.157268,0.021586,-0.229458,0.154981,0.088714,-0.099382,-0.002255,-0.006972,-0.022632,0.086881,-0.093008,-0.033157,-0.038988,0.010066,-0.037121,0.03456,0.07044,-0.01857,-0.006347,0.041707,-0.640401,0.06061,0.036865,0.021174,0.009882,0.002514,0.029868,0.013932,-0.063205,-0.033663,-0.005915,0.018332,-0.040471,0.047245,-0.016643,0.059553,-0.000439,-0.019837,-0.039426,-0.061895,0.080764,-0.069487,-0.127133,0.003018,0.005814,0.00184,-0.015001,-0.052456,-0.095933,0.056237,0.051564,0.011592,0.058071,0.050523,-0.046882,-0.78397,0.032132,0.077227,0.023504,0.001901,0.019654,-0.035744,-0.044438,0.016323,-0.068792,-0.032523,-0.034418,0.058239,-0.008238,-0.022087,0.003257,-0.134932,-0.049582,0.003896,-0.03161,-0.08318,0.009489,0.023732,-0.069814,-0.195869,-0.077847,0.052062,-0.023179,0.148662,0.060811,0.046415,-0.070887,0.100646,-0.017159,-0.070451,0.029142,0.023766,-0.018076,-0.067035,-0.045778,-0.036065,-0.014255,-0.052263,-0.011488,-0.052839,0.101561,-0.034346,0.030621,0.053491,0.047591,-0.059923,-0.003915,-0.102859,0.119867,0.054198,0.139213,-0.066201,-0.050906,-0.080693,0.122512,-0.017107,-0.0336,-0.110105,0.053379,0.105198,-0.00595,0.085929,0.039644,0.000688,0.090341,0.02896,0.017103,0.007929,-0.049879,0.098527,0.120339,-0.118549,0.037062,-0.140811,0.065332,0.036056,-0.028158,-0.107282,0.017783,-0.01457,-0.003655,-0.000879,0.050351,0.033302,-0.032374,-0.102659,0.030128,0.034105,0.048491,-0.153407,-0.085209,-0.03027,-0.005027,-0.06229,0.060101,0.054786,0.034528,0.061647,0.054063,0.132618,-0.074306,0.022347,-0.072705,-0.181028,0.031826,0.03823,0.026587,-0.01032,-0.037007,-0.021394,0.219941,0.053071,-0.05283,0.081708,0.038509,0.079903,0.132684,-0.008807,0.097421,0.018645,-0.02001,0.025977,0.041668,0.269692,-0.162351,0.139519,-0.000273,-0.047997,-0.082528,0.009726,0.006002,-0.034518,0.044055,-0.0027,0.101965,0.203405,0.066413,0.034277,-0.091939,-0.058589,-0.012907,-0.002531,0.020879,0.115097,-0.06455,-0.153296,0.068581,0.009178,-0.043292,-0.057231,-0.047207,0.005839,-0.020763,0.026576,0.027186
3,53847,whichard,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0.124379,0.123155,0.105457,0.430596,0.415121,0.0,0.004179,0.009676,0.004278,0.0,0.0,0.0,...,-0.072786,0.072361,-0.057798,0.044385,0.005329,0.039805,0.003755,-0.049763,-0.014534,-0.065294,-0.028938,-0.039785,0.019084,0.168974,-0.002473,-0.022186,0.000607,0.007956,-0.022222,0.036658,0.05704,0.045213,0.09935,-0.068385,0.094114,-0.092185,0.011696,-0.051768,-0.024148,0.118258,-0.019812,0.040053,-0.128707,-0.032905,0.049015,-0.165138,0.024866,-0.235502,0.162182,0.073743,-0.097449,-0.017836,0.003577,-0.012359,0.079411,-0.084616,-0.051392,-0.017189,0.003165,-0.001358,0.042378,0.075919,-0.020391,0.029846,0.052831,-0.694189,0.03069,0.036619,0.025247,0.026745,0.007509,0.018947,8e-06,-0.069484,-0.047695,-0.004812,0.018747,-0.045675,0.031603,-0.036465,0.073181,-0.008455,-0.024063,-0.031124,-0.072177,0.098977,-0.055151,-0.108403,0.027493,0.018115,0.03468,-0.008096,-0.032251,-0.121215,0.045242,0.057516,0.006834,0.029879,0.072126,-0.000507,-0.649408,0.050991,0.084494,0.010711,0.031926,0.041803,-0.023173,-0.058049,0.021118,-0.069627,-0.022949,-0.030748,0.065363,-0.006427,-0.009699,0.003394,-0.147467,-0.055736,0.011702,-0.024784,-0.070058,0.014468,0.006225,-0.07512,-0.227485,-0.044799,0.064127,-0.038268,0.14303,0.077574,0.051918,-0.072173,0.115477,-0.010826,-0.053075,0.021827,0.044148,-0.021992,-0.054211,-0.056665,-0.039903,-0.001474,-0.044066,0.000317,-0.046717,0.078336,-0.027381,0.023833,0.065877,0.059852,-0.065764,0.008798,-0.080418,0.115465,0.047966,0.119389,-0.084542,-0.072737,-0.084995,0.127433,0.003853,-0.033087,-0.128987,0.062119,0.100876,-0.015833,0.081038,0.036946,0.01225,0.100631,0.060061,0.024775,0.019398,-0.058147,0.111245,0.112658,-0.109085,0.062171,-0.141977,0.093174,0.023745,-0.025913,-0.085236,-0.005293,0.001742,-0.007652,-0.013145,0.022323,0.031024,-0.044444,-0.108637,0.038553,0.027583,0.057374,-0.140856,-0.070857,-0.013226,-0.009722,-0.054603,0.048142,0.077339,0.037193,0.062148,0.073296,0.122854,-0.048148,0.025761,-0.076465,-0.214707,0.036329,0.04141,0.043108,-0.017705,-0.029246,-0.057896,0.256242,0.058742,-0.054642,0.102896,0.038075,0.070529,0.119886,-0.022712,0.122509,-0.012185,-0.014318,0.031215,0.041633,0.313441,-0.153315,0.130531,-0.017005,-0.022649,-0.062804,0.011829,0.007399,-0.035983,0.060313,-0.002801,0.095373,0.214745,0.060337,0.059819,-0.102618,-0.04783,-0.020303,-0.007884,0.008755,0.127773,-0.081473,-0.165347,0.048698,-0.018583,-0.053634,-0.090086,-0.031005,0.010797,-0.053232,0.025511,0.015504
4,53848,webb,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0.027348,0.034133,0.081419,0.473283,0.415155,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,-0.030818,0.042224,-0.0682,0.047317,-0.013456,-0.002436,0.051357,-0.026557,-0.023588,-0.084956,-0.045292,0.001749,-0.054088,0.217444,-0.017611,-0.016721,-0.00358,0.024051,-0.060793,0.024541,0.035723,0.061084,0.09692,-0.069698,0.115089,-0.040888,0.028786,0.010713,-0.019572,0.144349,0.001403,0.088464,-0.117274,-0.015994,0.067196,-0.153198,0.03904,-0.172157,0.202833,0.039937,-0.082024,-0.026367,0.000133,-0.002983,0.153868,-0.056193,-0.079375,-0.044805,0.053958,0.000641,0.073703,0.021598,-0.089965,0.044783,0.039556,-0.719492,0.068516,0.042818,-0.015109,-0.014993,0.007027,-0.005757,-0.019476,-0.142222,-0.041598,0.000721,0.016406,-0.02601,-0.013725,-0.065669,0.057837,0.002039,-0.056605,-0.00124,-0.122238,0.130511,-0.001739,-0.105672,0.057378,0.028701,-0.006559,-0.021915,-0.065336,-0.131008,0.031762,0.066498,-0.012643,0.030093,0.056889,-0.031471,-0.567017,0.076722,0.088513,0.015749,0.065081,0.011606,-0.001211,-0.026897,0.052967,-0.057591,-0.04377,-0.069039,0.043835,0.007552,0.00414,-0.02224,-0.155387,-0.041389,-0.071231,0.072479,-0.018273,0.037388,-0.02534,-0.066788,-0.242339,-0.075602,0.112722,-0.025939,0.150463,0.082462,0.110177,-0.057782,0.080766,-0.057162,-0.022981,0.042826,0.028092,-0.00752,0.001575,-0.078498,-0.061032,-0.027239,-0.048714,-0.079386,-0.083205,0.039233,-0.055597,-0.026394,0.092426,0.091075,-0.050744,0.013062,-0.098073,0.055112,0.037736,0.149963,-0.086944,-0.06095,-0.097339,0.145765,0.024702,0.019933,-0.098721,0.059771,0.15749,0.028889,0.053986,0.033632,-0.006499,0.06852,0.070191,0.088932,0.001317,-0.038975,0.068242,0.190494,-0.150237,0.079239,-0.132378,0.129473,0.00318,-0.036908,-0.080395,0.006685,0.022435,-0.02196,-0.007403,0.072116,0.045969,-0.049651,-0.094487,0.044483,0.014615,0.04955,-0.121931,-0.040323,0.012505,-0.039559,-0.063777,0.0482,0.043387,0.008492,0.123738,0.126284,0.076018,-0.117951,0.020633,-0.100493,-0.197809,0.00717,0.052671,0.042319,-0.011268,-0.074073,-0.061237,0.322527,0.064988,-0.024337,0.072037,0.038409,0.068092,0.100443,-0.015362,0.148342,0.029396,-0.05152,0.011762,-0.038353,0.393049,-0.147386,0.151958,-0.016099,-0.054457,-0.069576,0.025679,0.004654,-0.050701,0.093574,0.041355,0.077618,0.260492,0.016556,0.059116,-0.057799,-0.070048,-0.055054,0.021978,-0.012645,0.161127,-0.089531,-0.201259,0.026744,-0.008826,-0.08457,-0.090231,-0.018201,-0.012622,-0.065629,0.02271,0.022754


In [10]:
# divide the data into label and features for use in ml models
X = df_ml.iloc[:, 2:]
y = df_ml.loc[:, 'judge']

# split the data into training and test sets
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X, y, random_state=6, stratify=y)

### Logistic Regression

In [11]:
from sklearn.linear_model import LogisticRegression
lr = LogisticRegression(C=100, solver='lbfgs', multi_class='auto', max_iter=3000, random_state=6)
lr.fit(X_train, y_train)
y_pred_lr = lr.predict(X_test)
print('Accuracy on training set = {}'.format(lr.score(X_train, y_train)))
print('Accuracy on test set = {}'.format(lr.score(X_test, y_test)))



Accuracy on training set = 0.9999339508924886
Accuracy on test set = 0.7542599564097484


### Linear Support Vector Machine Classifier

In [12]:
from sklearn.svm import LinearSVC
svc = LinearSVC(max_iter=1500, random_state=6)
svc.fit(X_train, y_train)
y_pred_svc = svc.predict(X_test)
print('Accuracy on training set = {}'.format(svc.score(X_train, y_train)))
print('Accuracy on test set = {}'.format(svc.score(X_test, y_test)))

Accuracy on training set = 0.9746866795462427
Accuracy on test set = 0.7675847037844263


The `SpaCy` vectors improved the accuracy of both models by approximately 2%, demonstrating that the substance of the opinions provides signal for the models.  Here are the final results:

| Model               | Train  | Test   |
|---------------------|--------|--------|
| Linear SVC          | 0.9747 | 0.7676 |
| Logistic Regression | 0.9999 | 0.7542 |

With both models generating an accuracy percentage above 75%, I have succeeded in building two models that will provide value to my clients.