In [1]:
import pandas as pd
import seaborn as sns

def highlight_12(val):
    color = 'blue' if val in Same_12 else 'black'
    return 'color: %s' % color

def highlight_13(val):
    color = 'green' if val in Same_13 else 'black'
    return 'color: %s' % color

def highlight_all(val):
    color = 'red' if val in Same_all else 'black'
    return 'color: %s' % color

## Porfolio Analysis Results
In this section, the porfolio performance table and weights table for each industry in terms of three kinds of estimates are displayed. 

### Porfolio Performance
Since the industry `Crude Petroleum and Natural Gas` cannot generate optimal portfolio selection for sample stimate and cosine similarity estimate, we do not include this industry in our performance comparison table.



In [2]:
columns = pd.MultiIndex.from_product([["Prepackaged Software", 
                                       "Pharmaceutical Preparations", "Real Estate Investment Trusts", 
                                       "State Commercial Banks",],
                                      ['Sample', 'Cosine Similarity', 'Factor Model']])

data = [[0.6,1.1,2.1,
         1.2,1.2,0.4,
         0.5,0.6,0.3,
         1.2,1.1,0.9],
        [2.4,2.9,15.9,
         2.1,2.6,12.2,
         1.8,1.7,8.0,
         2.7,2.2,20.8],
        [-0.57,-0.30,0.00,
         -0.35,-0.32,-0.13,
         -0.80,-0.81,-0.22,
         -0.28,-0.38,-0.05]]

methods = ["Expected Annual Return", "Annual Volatility", "Sharpe Ratio"]

df = pd.DataFrame(data, index = methods, columns = columns).T.round(2)

cm = sns.light_palette("#5CCDC6", n_colors = 35, as_cmap=True)

df.style.background_gradient(cmap=cm)

Unnamed: 0,Unnamed: 1,Expected Annual Return,Annual Volatility,Sharpe Ratio
Prepackaged Software,Sample,0.6,2.4,-0.57
Prepackaged Software,Cosine Similarity,1.1,2.9,-0.3
Prepackaged Software,Factor Model,2.1,15.9,0.0
Pharmaceutical Preparations,Sample,1.2,2.1,-0.35
Pharmaceutical Preparations,Cosine Similarity,1.2,2.6,-0.32
Pharmaceutical Preparations,Factor Model,0.4,12.2,-0.13
Real Estate Investment Trusts,Sample,0.5,1.8,-0.8
Real Estate Investment Trusts,Cosine Similarity,0.6,1.7,-0.81
Real Estate Investment Trusts,Factor Model,0.3,8.0,-0.22
State Commercial Banks,Sample,1.2,2.7,-0.28



From the above table, we can see that factor model estimated portfolios give much higher volatility while the cosine similarity estimated portfolios give similiar results with the sample estimated portfolios. 

Next, we will look into each model's choice of companies to determine if cosine similarity analysis and factor model can be used to construct similar portfolios as the sample estimate.


In [3]:
columns = pd.MultiIndex.from_product([["Prepackaged Software", "Crude Petroleum and Natural Gas",
                                       "Pharmaceutical Preparations", "Real Estate Investment Trusts", 
                                       "State Commercial Banks",],
                                      ['Sample', 'Cosine Similarity', 'Factor Model']])

data = [[0.6,1.1,2.1,
         " "," ",0.6,
         1.2,1.2,0.4,
         0.5,0.6,0.3,
         1.2,1.1,0.9],
        [2.4,2.9,15.9,
         " "," ",32.8,
         2.1,2.6,12.2,
         1.8,1.7,8.0,
         2.7,2.2,20.8],
        [-0.57,-0.30,0.00,
         " "," ",-0.04,
         -0.35,-0.32,-0.13,
         -0.80,-0.81,-0.22,
         -0.28,-0.38,-0.05]]

methods = ["Expected Annual Return", "Annual Volatility", "Sharpe Ratio"]

df = pd.DataFrame(data, index = methods, columns = columns).T.round(2)

cm = sns.light_palette("#5CCDC6", n_colors = 35, as_cmap=True)

#df.style.background_gradient(cmap=cm)

### Porfolio Weights
We present the portfolio weights table for comparison of three estimates. 

First, we highlight the companies in common as `blue` when comparing estimates from sample covariance and estimates from the cosine similarity of the business descriptions. 

Then, we highlight the companies in common as `green` when comparing estimates from sample covariance and estimates from factor model based on business descriptions and return data. 

Those companies that exist in all three constructed portfolios are hilighted `red` in the portfolio weights table.

#### Prepackaged Software (mass reproduction of software)

In [4]:
sample_software = pd.read_csv("data/min_vol_sample_Prepackaged_Software.csv")
cos_sim_software = pd.read_csv("data/min_vol_cos_sim_Prepackaged_Software.csv")
factor_model_software = pd.read_csv("data/min_vol_factor_model_Prepackaged_Software.csv")

sample_software = sample_software.sort_values(by=["Weight"], ascending=False).reset_index(drop=True)
cos_sim_software = cos_sim_software.sort_values(by=["Weight"], ascending=False).reset_index(drop=True)
factor_model_software = factor_model_software.sort_values(by=["Weight"], ascending=False).reset_index(drop=True)

columns = pd.MultiIndex.from_product([['Sample Estimate', 'Cosine Similarity Estimate'], 
                                      ['Company Name', 'Weight']])
software_12 = pd.concat([sample_software, cos_sim_software], axis=1)
software_12.columns = columns
software_12 = software_12.fillna(" ")

columns = pd.MultiIndex.from_product([['Sample Estimate', 'Factor Model Estimate'], 
                                      ['Company Name', 'Weight']])
software_13 = pd.concat([sample_software, factor_model_software], axis=1)
software_13.columns = columns
software_13 = software_13.fillna(" ")

columns = pd.MultiIndex.from_product([['Sample Estimate', 'Cosine Similarity Estimate', 'Factor Model Estimate'], 
                                      ['Company Name', 'Weight']])
software = pd.concat([pd.concat([sample_software, cos_sim_software], axis=1), factor_model_software], axis=1)
software.columns = columns
software = software.fillna(" ")

Same_12 = (set(sample_software.Company_Name) & set(cos_sim_software.Company_Name)) 
Same_13 = (set(sample_software.Company_Name) & set(factor_model_software.Company_Name)) 
Same_all = Same_12 & Same_13

##### Sample Estimate V.S. Cosine Similarity Estimate

In [5]:
software_12.style.applymap(highlight_12)

Unnamed: 0_level_0,Sample Estimate,Sample Estimate,Cosine Similarity Estimate,Cosine Similarity Estimate
Unnamed: 0_level_1,Company Name,Weight,Company Name,Weight
0,"BLACK KNIGHT, INC.",0.2,"BLACK KNIGHT, INC.",0.2
1,AWARE INC /MA/,0.2,ORACLE CORP,0.16315
2,"ACI WORLDWIDE, INC.",0.11314,ANSYS INC,0.1539
3,ORACLE CORP,0.0917,ULTIMATE SOFTWARE GROUP INC,0.1035
4,"NUANCE COMMUNICATIONS, INC.",0.08608,NATIONAL INSTRUMENTS CORP,0.09372
5,COMMVAULT SYSTEMS INC,0.07381,"Q2 HOLDINGS, INC.",0.0619
6,"QUALYS, INC.",0.06668,"NUANCE COMMUNICATIONS, INC.",0.05947
7,QUMU CORP,0.05153,"ACI WORLDWIDE, INC.",0.04754
8,"ENDURANCE INTERNATIONAL GROUP HOLDINGS, INC.",0.02554,GSE SYSTEMS INC,0.04031
9,MICROSTRATEGY INC,0.0216,REALPAGE INC,0.02937


##### Sample Estimate V.S. Factor Model Estimate

In [6]:
software_13.style.applymap(highlight_13)

Unnamed: 0_level_0,Sample Estimate,Sample Estimate,Factor Model Estimate,Factor Model Estimate
Unnamed: 0_level_1,Company Name,Weight,Company Name,Weight
0,"BLACK KNIGHT, INC.",0.2,"POLARITYTE, INC.",0.2
1,AWARE INC /MA/,0.2,"2U, INC.",0.19685
2,"ACI WORLDWIDE, INC.",0.11314,MICROSTRATEGY INC,0.13768
3,ORACLE CORP,0.0917,"ENDURANCE INTERNATIONAL GROUP HOLDINGS, INC.",0.1129
4,"NUANCE COMMUNICATIONS, INC.",0.08608,ANSYS INC,0.06834
5,COMMVAULT SYSTEMS INC,0.07381,TABLEAU SOFTWARE INC,0.05865
6,"QUALYS, INC.",0.06668,QUMU CORP,0.04424
7,QUMU CORP,0.05153,"BRIDGELINE DIGITAL, INC.",0.0355
8,"ENDURANCE INTERNATIONAL GROUP HOLDINGS, INC.",0.02554,3D SYSTEMS CORP,0.02694
9,MICROSTRATEGY INC,0.0216,NATIONAL INSTRUMENTS CORP,0.02466


##### All Three Estimates

In [7]:
software.style.applymap(highlight_all)

Unnamed: 0_level_0,Sample Estimate,Sample Estimate,Cosine Similarity Estimate,Cosine Similarity Estimate,Factor Model Estimate,Factor Model Estimate
Unnamed: 0_level_1,Company Name,Weight,Company Name,Weight,Company Name,Weight
0,"BLACK KNIGHT, INC.",0.2,"BLACK KNIGHT, INC.",0.2,"POLARITYTE, INC.",0.2
1,AWARE INC /MA/,0.2,ORACLE CORP,0.16315,"2U, INC.",0.19685
2,"ACI WORLDWIDE, INC.",0.11314,ANSYS INC,0.1539,MICROSTRATEGY INC,0.13768
3,ORACLE CORP,0.0917,ULTIMATE SOFTWARE GROUP INC,0.1035,"ENDURANCE INTERNATIONAL GROUP HOLDINGS, INC.",0.1129
4,"NUANCE COMMUNICATIONS, INC.",0.08608,NATIONAL INSTRUMENTS CORP,0.09372,ANSYS INC,0.06834
5,COMMVAULT SYSTEMS INC,0.07381,"Q2 HOLDINGS, INC.",0.0619,TABLEAU SOFTWARE INC,0.05865
6,"QUALYS, INC.",0.06668,"NUANCE COMMUNICATIONS, INC.",0.05947,QUMU CORP,0.04424
7,QUMU CORP,0.05153,"ACI WORLDWIDE, INC.",0.04754,"BRIDGELINE DIGITAL, INC.",0.0355
8,"ENDURANCE INTERNATIONAL GROUP HOLDINGS, INC.",0.02554,GSE SYSTEMS INC,0.04031,3D SYSTEMS CORP,0.02694
9,MICROSTRATEGY INC,0.0216,REALPAGE INC,0.02937,NATIONAL INSTRUMENTS CORP,0.02466


For the industry `Prepackaged Software`, 6/15 of the companies in sample estimated portfolio and cosine similarity estimated portfolio are the same. 4/15 of the companies in sample estimated portfolio and factor model estimated portfolio are the same. When we compare all three porfolios, there is only one company in common.

#### Pharmaceutical Preparations

In [8]:
sample_pharm = pd.read_csv("data/min_vol_sample_Pharmaceutical_Preparations.csv")
cos_sim_pharm = pd.read_csv("data/min_vol_cos_sim_Pharmaceutical_Preparations.csv")
factor_model_pharm = pd.read_csv("data/min_vol_factor_model_Pharmaceutical_Preparations.csv")

In [9]:
sample_pharm = sample_pharm.sort_values(by=["Weight"], ascending=False).reset_index(drop=True)
cos_sim_pharm = cos_sim_pharm.sort_values(by=["Weight"], ascending=False).reset_index(drop=True)
factor_model_pharm = factor_model_pharm.sort_values(by=["Weight"], ascending=False).reset_index(drop=True)

In [10]:
columns = pd.MultiIndex.from_product([['Sample Estimate', 'Cosine Similarity Estimate'], 
                                      ['Company Name', 'Weight']])
pharm_12 = pd.concat([sample_pharm, cos_sim_pharm], axis=1)
pharm_12.columns = columns
pharm_12 = pharm_12.fillna(" ")

columns = pd.MultiIndex.from_product([['Sample Estimate', 'Factor Model Estimate'], 
                                      ['Company Name', 'Weight']])
pharm_13 = pd.concat([sample_pharm, factor_model_pharm], axis=1)
pharm_13.columns = columns
pharm_13 = pharm_13.fillna(" ")

columns = pd.MultiIndex.from_product([['Sample Estimate', 'Cosine Similarity Estimate', 'Factor Model Estimate'], 
                                      ['Company Name', 'Weight']])
pharm = pd.concat([pd.concat([sample_pharm, cos_sim_pharm], axis=1), factor_model_pharm], axis=1)
pharm.columns = columns
pharm = pharm.fillna(" ")

Same_12 = (set(sample_pharm.Company_Name) & set(cos_sim_pharm.Company_Name)) 
Same_13 = (set(sample_pharm.Company_Name) & set(factor_model_pharm.Company_Name)) 
Same_all = Same_12 & Same_13

##### Sample Estimate V.S. Cosine Similarity Estimate

In [11]:
pharm_12.style.applymap(highlight_12)

Unnamed: 0_level_0,Sample Estimate,Sample Estimate,Cosine Similarity Estimate,Cosine Similarity Estimate
Unnamed: 0_level_1,Company Name,Weight,Company Name,Weight
0,"MERCK & CO., INC.",0.2,ZOETIS INC.,0.2
1,JOHNSON & JOHNSON,0.17878,PFIZER INC,0.2
2,BRISTOL MYERS SQUIBB CO,0.12824,JOHNSON & JOHNSON,0.18756
3,"ASSEMBLY BIOSCIENCES, INC.",0.05775,"MERCK & CO., INC.",0.13753
4,"PROPHASE LABS, INC.",0.0512,BIOSPECIFICS TECHNOLOGIES CORP,0.07394
5,ORAMED PHARMACEUTICALS INC.,0.04982,BIOMARIN PHARMACEUTICAL INC,0.04572
6,STEMLINE THERAPEUTICS INC,0.04273,BRISTOL MYERS SQUIBB CO,0.03719
7,"IMPRIMIS PHARMACEUTICALS, INC.",0.04181,LILLY ELI & CO,0.03562
8,PFENEX INC.,0.03777,XENCOR INC,0.02108
9,BIODELIVERY SCIENCES INTERNATIONAL INC,0.0368,"PACIRA PHARMACEUTICALS, INC.",0.01883


##### Sample Estimate V.S. Factor Model Estimate

In [12]:
pharm_13.style.applymap(highlight_13)

Unnamed: 0_level_0,Sample Estimate,Sample Estimate,Factor Model Estimate,Factor Model Estimate
Unnamed: 0_level_1,Company Name,Weight,Company Name,Weight
0,"MERCK & CO., INC.",0.2,NATURES SUNSHINE PRODUCTS INC,0.13263
1,JOHNSON & JOHNSON,0.17878,ZOETIS INC.,0.11746
2,BRISTOL MYERS SQUIBB CO,0.12824,JOHNSON & JOHNSON,0.08628
3,"ASSEMBLY BIOSCIENCES, INC.",0.05775,PROGENICS PHARMACEUTICALS INC,0.08582
4,"PROPHASE LABS, INC.",0.0512,ARENA PHARMACEUTICALS INC,0.08524
5,ORAMED PHARMACEUTICALS INC.,0.04982,"TELIGENT, INC.",0.08129
6,STEMLINE THERAPEUTICS INC,0.04273,FLEXION THERAPEUTICS INC,0.07107
7,"IMPRIMIS PHARMACEUTICALS, INC.",0.04181,ANI PHARMACEUTICALS INC,0.06101
8,PFENEX INC.,0.03777,"ACLARIS THERAPEUTICS, INC.",0.05895
9,BIODELIVERY SCIENCES INTERNATIONAL INC,0.0368,XOMA CORP,0.05504


##### All Three Estimates

In [13]:
pharm.style.applymap(highlight_all)

Unnamed: 0_level_0,Sample Estimate,Sample Estimate,Cosine Similarity Estimate,Cosine Similarity Estimate,Factor Model Estimate,Factor Model Estimate
Unnamed: 0_level_1,Company Name,Weight,Company Name,Weight,Company Name,Weight
0,"MERCK & CO., INC.",0.2,ZOETIS INC.,0.2,NATURES SUNSHINE PRODUCTS INC,0.13263
1,JOHNSON & JOHNSON,0.17878,PFIZER INC,0.2,ZOETIS INC.,0.11746
2,BRISTOL MYERS SQUIBB CO,0.12824,JOHNSON & JOHNSON,0.18756,JOHNSON & JOHNSON,0.08628
3,"ASSEMBLY BIOSCIENCES, INC.",0.05775,"MERCK & CO., INC.",0.13753,PROGENICS PHARMACEUTICALS INC,0.08582
4,"PROPHASE LABS, INC.",0.0512,BIOSPECIFICS TECHNOLOGIES CORP,0.07394,ARENA PHARMACEUTICALS INC,0.08524
5,ORAMED PHARMACEUTICALS INC.,0.04982,BIOMARIN PHARMACEUTICAL INC,0.04572,"TELIGENT, INC.",0.08129
6,STEMLINE THERAPEUTICS INC,0.04273,BRISTOL MYERS SQUIBB CO,0.03719,FLEXION THERAPEUTICS INC,0.07107
7,"IMPRIMIS PHARMACEUTICALS, INC.",0.04181,LILLY ELI & CO,0.03562,ANI PHARMACEUTICALS INC,0.06101
8,PFENEX INC.,0.03777,XENCOR INC,0.02108,"ACLARIS THERAPEUTICS, INC.",0.05895
9,BIODELIVERY SCIENCES INTERNATIONAL INC,0.0368,"PACIRA PHARMACEUTICALS, INC.",0.01883,XOMA CORP,0.05504


For the industry `Pharmaceutical Preparations`, 4/21 of the companies in sample estimated portfolio and cosine similarity estimated portfolio are the same. 3/21 of the companies in sample estimated portfolio and factor model estimated portfolio are the same. When we compare all three porfolios, there are two companies in common.

#### Real Estate Investment Trusts

In [14]:
sample_real_estate = pd.read_csv("data/min_vol_sample_Real_Estate_Investment_Trusts.csv")
cos_sim_real_estate = pd.read_csv("data/min_vol_cos_sim_Real_Estate_Investment_Trusts.csv")
factor_model_real_estate = pd.read_csv("data/min_vol_factor_model_Real_Estate_Investment_Trusts.csv")

In [15]:
sample_real_estate = sample_real_estate.sort_values(by=["Weight"], ascending=False).reset_index(drop=True)
cos_sim_real_estate = cos_sim_real_estate.sort_values(by=["Weight"], ascending=False).reset_index(drop=True)
factor_model_real_estate = factor_model_real_estate.sort_values(by=["Weight"], ascending=False).reset_index(drop=True)

In [16]:
columns = pd.MultiIndex.from_product([['Sample Estimate', 'Cosine Similarity Estimate'], 
                                      ['Company Name', 'Weight']])
real_estate_12 = pd.concat([sample_real_estate, cos_sim_real_estate], axis=1)
real_estate_12.columns = columns
real_estate_12 = real_estate_12.fillna(" ")

columns = pd.MultiIndex.from_product([['Sample Estimate', 'Factor Model Estimate'], 
                                      ['Company Name', 'Weight']])
real_estate_13 = pd.concat([sample_real_estate, factor_model_real_estate], axis=1)
real_estate_13.columns = columns
real_estate_13 = real_estate_13.fillna(" ")

columns = pd.MultiIndex.from_product([['Sample Estimate', 'Cosine Similarity Estimate', 'Factor Model Estimate'], 
                                      ['Company Name', 'Weight']])
real_estate = pd.concat([pd.concat([sample_real_estate, cos_sim_real_estate], axis=1), factor_model_real_estate], axis=1)
real_estate.columns = columns
real_estate = real_estate.fillna(" ")

Same_12 = (set(sample_real_estate.Company_Name) & set(cos_sim_real_estate.Company_Name)) 
Same_13 = (set(sample_real_estate.Company_Name) & set(factor_model_real_estate.Company_Name)) 
Same_all = Same_12 & Same_13

##### Sample Estimate V.S. Cosine Similarity Estimate 

In [17]:
real_estate_12.style.applymap(highlight_12)

Unnamed: 0_level_0,Sample Estimate,Sample Estimate,Cosine Similarity Estimate,Cosine Similarity Estimate
Unnamed: 0_level_1,Company Name,Weight,Company Name,Weight
0,EQUITY COMMONWEALTH,0.2,EQUITY COMMONWEALTH,0.16327
1,GREAT AJAX CORP.,0.2,SUN COMMUNITIES INC,0.14907
2,HMG COURTLAND PROPERTIES INC,0.12513,GREAT AJAX CORP.,0.13806
3,PUBLIC STORAGE,0.10938,EQUINIX INC,0.07068
4,ARES COMMERCIAL REAL ESTATE CORP,0.09107,"GAMING & LEISURE PROPERTIES, INC.",0.06734
5,CIM COMMERCIAL TRUST CORP,0.05461,PUBLIC STORAGE,0.06339
6,IMPAC MORTGAGE HOLDINGS INC,0.05108,DUKE REALTY CORP,0.05369
7,CROWN CASTLE INTERNATIONAL CORP,0.04875,HIGHWOODS PROPERTIES INC,0.05347
8,LADDER CAPITAL CORP,0.0442,"MFA FINANCIAL, INC.",0.05101
9,ALEXANDERS INC,0.02285,ANNALY CAPITAL MANAGEMENT INC,0.05094


##### Sample Estimate V.S. Factor Model Estimate 

In [18]:
real_estate_13.style.applymap(highlight_13)

Unnamed: 0_level_0,Sample Estimate,Sample Estimate,Factor Model Estimate,Factor Model Estimate
Unnamed: 0_level_1,Company Name,Weight,Company Name,Weight
0,EQUITY COMMONWEALTH,0.2,EXTRA SPACE STORAGE INC.,0.00781
1,GREAT AJAX CORP.,0.2,WHITESTONE REIT,0.00781
2,HMG COURTLAND PROPERTIES INC,0.12513,SOTHERLY HOTELS INC.,0.00781
3,PUBLIC STORAGE,0.10938,CHERRY HILL MORTGAGE INVESTMENT CORP,0.00781
4,ARES COMMERCIAL REAL ESTATE CORP,0.09107,"JERNIGAN CAPITAL, INC.",0.00781
5,CIM COMMERCIAL TRUST CORP,0.05461,CIM COMMERCIAL TRUST CORP,0.00781
6,IMPAC MORTGAGE HOLDINGS INC,0.05108,AMERICAN CAMPUS COMMUNITIES INC,0.00781
7,CROWN CASTLE INTERNATIONAL CORP,0.04875,DYNEX CAPITAL INC,0.00781
8,LADDER CAPITAL CORP,0.0442,BRANDYWINE REALTY TRUST,0.00781
9,ALEXANDERS INC,0.02285,ISTAR INC.,0.00781


##### All Three Estimates

In [19]:
real_estate.style.applymap(highlight_all)

Unnamed: 0_level_0,Sample Estimate,Sample Estimate,Cosine Similarity Estimate,Cosine Similarity Estimate,Factor Model Estimate,Factor Model Estimate
Unnamed: 0_level_1,Company Name,Weight,Company Name,Weight,Company Name,Weight
0,EQUITY COMMONWEALTH,0.2,EQUITY COMMONWEALTH,0.16327,EXTRA SPACE STORAGE INC.,0.00781
1,GREAT AJAX CORP.,0.2,SUN COMMUNITIES INC,0.14907,WHITESTONE REIT,0.00781
2,HMG COURTLAND PROPERTIES INC,0.12513,GREAT AJAX CORP.,0.13806,SOTHERLY HOTELS INC.,0.00781
3,PUBLIC STORAGE,0.10938,EQUINIX INC,0.07068,CHERRY HILL MORTGAGE INVESTMENT CORP,0.00781
4,ARES COMMERCIAL REAL ESTATE CORP,0.09107,"GAMING & LEISURE PROPERTIES, INC.",0.06734,"JERNIGAN CAPITAL, INC.",0.00781
5,CIM COMMERCIAL TRUST CORP,0.05461,PUBLIC STORAGE,0.06339,CIM COMMERCIAL TRUST CORP,0.00781
6,IMPAC MORTGAGE HOLDINGS INC,0.05108,DUKE REALTY CORP,0.05369,AMERICAN CAMPUS COMMUNITIES INC,0.00781
7,CROWN CASTLE INTERNATIONAL CORP,0.04875,HIGHWOODS PROPERTIES INC,0.05347,DYNEX CAPITAL INC,0.00781
8,LADDER CAPITAL CORP,0.0442,"MFA FINANCIAL, INC.",0.05101,BRANDYWINE REALTY TRUST,0.00781
9,ALEXANDERS INC,0.02285,ANNALY CAPITAL MANAGEMENT INC,0.05094,ISTAR INC.,0.00781


For the industry `Real Estate Investment Trusts`, 6/13 of the companies in sample estimated portfolio and cosine similarity estimated portfolio are the same. 5/13 of the companies in sample estimated portfolio and factor model estimated portfolio are the same. When we compare all three porfolios, there are two companies in common.

#### State Commercial Banks (commercial banking)

In [20]:
sample_banks = pd.read_csv("data/min_vol_sample_State_Commercial_Banks.csv")
cos_sim_banks = pd.read_csv("data/min_vol_cos_sim_State_Commercial_Banks.csv")
factor_model_banks = pd.read_csv("data/min_vol_factor_model_State_Commercial_Banks.csv")

In [21]:
sample_banks = sample_banks.sort_values(by=["Weight"], ascending=False).reset_index(drop=True)
cos_sim_banks = cos_sim_banks.sort_values(by=["Weight"], ascending=False).reset_index(drop=True)
factor_model_banks = factor_model_banks.sort_values(by=["Weight"], ascending=False).reset_index(drop=True)

In [22]:
columns = pd.MultiIndex.from_product([['Sample Estimate', 'Cosine Similarity Estimate'], 
                                      ['Company Name', 'Weight']])
banks_12 = pd.concat([sample_banks, cos_sim_banks], axis=1)
banks_12.columns = columns
banks_12 = banks_12.fillna(" ")

columns = pd.MultiIndex.from_product([['Sample Estimate', 'Factor Model Estimate'], 
                                      ['Company Name', 'Weight']])
banks_13 = pd.concat([sample_banks, factor_model_banks], axis=1)
banks_13.columns = columns
banks_13 = banks_13.fillna(" ")

columns = pd.MultiIndex.from_product([['Sample Estimate', 'Cosine Similarity Estimate', 'Factor Model Estimate'], 
                                      ['Company Name', 'Weight']])
banks = pd.concat([pd.concat([sample_banks, cos_sim_banks], axis=1), factor_model_banks], axis=1)
banks.columns = columns
banks = banks.fillna(" ")

Same_12 = (set(sample_banks.Company_Name) & set(cos_sim_banks.Company_Name)) 
Same_13 = (set(sample_banks.Company_Name) & set(factor_model_banks.Company_Name)) 
Same_all = Same_12 & Same_13

##### Sample Estimate V.S. Cosine Similarity Estimate

In [23]:
banks_12.style.applymap(highlight_12)

Unnamed: 0_level_0,Sample Estimate,Sample Estimate,Cosine Similarity Estimate,Cosine Similarity Estimate
Unnamed: 0_level_1,Company Name,Weight,Company Name,Weight
0,INVESTAR HOLDING CORP,0.1944,BANNER CORP,0.2
1,GUARANTY FEDERAL BANCSHARES INC,0.17724,INVESTAR HOLDING CORP,0.16789
2,VILLAGE BANK & TRUST FINANCIAL CORP.,0.13994,CITIZENS & NORTHERN CORP,0.11305
3,"RELIANT BANCORP, INC.",0.12273,BANK OF NEW YORK MELLON CORP,0.09816
4,"CAROLINA TRUST BANCSHARES, INC.",0.11786,INDEPENDENT BANK CORP /MI/,0.0954
5,BANK OF NEW YORK MELLON CORP,0.09533,EAST WEST BANCORP INC,0.08342
6,CITIZENS & NORTHERN CORP,0.05375,ENTERPRISE FINANCIAL SERVICES CORP,0.07078
7,FIRST COMMUNITY CORP /SC/,0.05076,S&T BANCORP INC,0.05201
8,MACKINAC FINANCIAL CORP /MI/,0.02478,BANK OF HAWAII CORP,0.04935
9,"FAUQUIER BANKSHARES, INC.",0.02143,HOWARD BANCORP INC,0.02931


 ##### Sample Estimate V.S. Factor Model Estimate

In [24]:
banks_13.style.applymap(highlight_13)

Unnamed: 0_level_0,Sample Estimate,Sample Estimate,Factor Model Estimate,Factor Model Estimate
Unnamed: 0_level_1,Company Name,Weight,Company Name,Weight
0,INVESTAR HOLDING CORP,0.1944,"CAROLINA TRUST BANCSHARES, INC.",0.05794
1,GUARANTY FEDERAL BANCSHARES INC,0.17724,SUMMIT FINANCIAL GROUP INC,0.05359
2,VILLAGE BANK & TRUST FINANCIAL CORP.,0.13994,"ATLANTIC CAPITAL BANCSHARES, INC.",0.04742
3,"RELIANT BANCORP, INC.",0.12273,"INDEPENDENT BANK GROUP, INC.",0.04721
4,"CAROLINA TRUST BANCSHARES, INC.",0.11786,HOWARD BANCORP INC,0.04462
5,BANK OF NEW YORK MELLON CORP,0.09533,TEXAS CAPITAL BANCSHARES INC/TX,0.04227
6,CITIZENS & NORTHERN CORP,0.05375,UNITED BANKSHARES INC/WV,0.04209
7,FIRST COMMUNITY CORP /SC/,0.05076,COMMERCE BANCSHARES INC /MO/,0.04133
8,MACKINAC FINANCIAL CORP /MI/,0.02478,OHIO VALLEY BANC CORP,0.03739
9,"FAUQUIER BANKSHARES, INC.",0.02143,"LIVE OAK BANCSHARES, INC.",0.0355


##### All Three Estimates

In [25]:
banks.style.applymap(highlight_all)

Unnamed: 0_level_0,Sample Estimate,Sample Estimate,Cosine Similarity Estimate,Cosine Similarity Estimate,Factor Model Estimate,Factor Model Estimate
Unnamed: 0_level_1,Company Name,Weight,Company Name,Weight,Company Name,Weight
0,INVESTAR HOLDING CORP,0.1944,BANNER CORP,0.2,"CAROLINA TRUST BANCSHARES, INC.",0.05794
1,GUARANTY FEDERAL BANCSHARES INC,0.17724,INVESTAR HOLDING CORP,0.16789,SUMMIT FINANCIAL GROUP INC,0.05359
2,VILLAGE BANK & TRUST FINANCIAL CORP.,0.13994,CITIZENS & NORTHERN CORP,0.11305,"ATLANTIC CAPITAL BANCSHARES, INC.",0.04742
3,"RELIANT BANCORP, INC.",0.12273,BANK OF NEW YORK MELLON CORP,0.09816,"INDEPENDENT BANK GROUP, INC.",0.04721
4,"CAROLINA TRUST BANCSHARES, INC.",0.11786,INDEPENDENT BANK CORP /MI/,0.0954,HOWARD BANCORP INC,0.04462
5,BANK OF NEW YORK MELLON CORP,0.09533,EAST WEST BANCORP INC,0.08342,TEXAS CAPITAL BANCSHARES INC/TX,0.04227
6,CITIZENS & NORTHERN CORP,0.05375,ENTERPRISE FINANCIAL SERVICES CORP,0.07078,UNITED BANKSHARES INC/WV,0.04209
7,FIRST COMMUNITY CORP /SC/,0.05076,S&T BANCORP INC,0.05201,COMMERCE BANCSHARES INC /MO/,0.04133
8,MACKINAC FINANCIAL CORP /MI/,0.02478,BANK OF HAWAII CORP,0.04935,OHIO VALLEY BANC CORP,0.03739
9,"FAUQUIER BANKSHARES, INC.",0.02143,HOWARD BANCORP INC,0.02931,"LIVE OAK BANCSHARES, INC.",0.0355


For the industry `State Commercial Banks`, 3/11 of the companies in sample estimated portfolio and cosine similarity estimated portfolio are the same. 6/11 of the companies in sample estimated portfolio and factor model estimated portfolio are the same. When we compare all three porfolios, there is only one company in common. 

Although factor model has more companies in common with the sample model, this factor model portfolio gives a large portfolio wiht 43 companies, we cannot conclude that factor model is doing a better job on constructing the portfolio.

#### Crude Petroleum and Natural Gas
Since there is no optimal portfolio generating from sample estimate and cosine similarity estimate for the Crude Petroleum and Natural Gas industry, we only display the portfolio weights for factor model estimate.

In [26]:
factor_model_crude = pd.read_csv("data/min_vol_factor_model_Crude_Petroleum_and_Natural_Gas.csv")

factor_model_crude = factor_model_crude.sort_values(by=["Weight"], ascending=False).reset_index(drop=True)

In [27]:
columns = pd.MultiIndex.from_product([['Factor Model Estimate'], 
                                      ['Company Name', 'Weight']])

factor_model_crude.columns = columns

In [28]:
factor_model_crude.style.applymap(highlight_all)

Unnamed: 0_level_0,Factor Model Estimate,Factor Model Estimate
Unnamed: 0_level_1,Company Name,Weight
0,CALIFORNIA RESOURCES CORP,0.16181
1,KOSMOS ENERGY LTD.,0.14627
2,ANTERO RESOURCES CORP,0.08452
3,CALLON PETROLEUM CO,0.07632
4,SM ENERGY CO,0.07055
5,PANHANDLE OIL & GAS INC,0.05922
6,WHITING PETROLEUM CORP,0.0587
7,PEDEVCO CORP,0.05331
8,CONTANGO OIL & GAS CO,0.04169
9,CONCHO RESOURCES INC,0.0377


### Conclusion

Overall, cosine similarity analysis has a better performance on estimating covariance close to the sample covariance. From the performance table, we can conclude that sample estimate and cosine similarity estimate give similar results on expected returns and annual volatility for the minimum-variance portfolio.

For all portfolios constructed, less than half of the companies are in common with sample estimate portfolio being the reference. Most portfolios generated by cosine similarity estimate and factor model estimate contain more companies than the sample portfolio, especially factor model generated portfolios usually double the number of companies selected. Considering the portfolio size and the number of common companies, cosine similarity estimate illustrates highest degree of identical attributes for only one industry - `Prepackaged Software`. 

To sum up, the feasibility of constructing similar portfolios using the document embeddings of the company's business description in SEC filings is low. However, our research has certain limitations, such as the accrucy of topic selection in factor model, due to unsupervised learning. Moreover, the informativeness of the words we used in word embedding is not confirmed. 

For future research, we may apply topic modeling on the risk disclosure section of SEC filings and use the risk factors in the factor model. Business description of the companies may not explain much of their returns and correlation with other companies. Risk disclosure of the companies may reveal more information.