For our research we will be working with the states of America because they are a great example of different laws towards cannabis and provide wide range of different datasets

In [143]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt

from sklearn.impute import SimpleImputer
from sklearn.linear_model import LogisticRegression, LinearRegression
from sklearn import metrics
from sklearn import tree
from sklearn.model_selection import cross_val_score, train_test_split
from sklearn.metrics import accuracy_score
from sklearn.decomposition import PCA, FactorAnalysis
from sklearn.feature_selection import VarianceThreshold
from sklearn.ensemble import RandomForestClassifier
from mlxtend.feature_selection import SequentialFeatureSelector
from io import StringIO

import time
import random
import copy
import sys

Please note, that for example NaN value in Medical marijuana legalized doesn't necessarily mean it's is completely forbidden in a state, it could have some uniquely limited access to it.<br>
Also if marijuana legalized for recreational use it automatically means that medical marijuana is legalized.

In [48]:
data_laws = pd.read_csv("state_marijuana_laws_10_2016.csv")#State laws 2016
#comment the line below to get unshuffled dataframe
data_laws = data_laws.sample(frac=1)
data_laws

Unnamed: 0,State,Medical marijuana legalized,Marijuana legalized for recreational use,No laws legalizing marijuana
0,Connecticut,Yes,,
27,Nebraska,,,Yes
7,Illinois,Yes,,
46,Nevada,,Yes,
29,Ohio,,,Yes
37,West Virginia,,,Yes
48,Vermont,,Yes,
10,Minnesota,Yes,,
30,Oklahoma,,,Yes
41,California,,Yes,


Since this way of presenting data is unacceptable we will replace NaN values with 0 and `Yes` with 1.

In [49]:
data_laws.fillna(0, inplace=True)
data_laws.replace('Yes',1, inplace=True)
data_laws.head(6)

Unnamed: 0,State,Medical marijuana legalized,Marijuana legalized for recreational use,No laws legalizing marijuana
0,Connecticut,1,0,0
27,Nebraska,0,0,1
7,Illinois,1,0,0
46,Nevada,0,1,0
29,Ohio,0,0,1
37,West Virginia,0,0,1


In [50]:
data_laws.set_index('State',inplace=True)
data_laws = data_laws[data_laws==1].stack().reset_index().drop(0,1)
data_laws.rename(columns={'level_1': 'Marijuana laws status'}, inplace=True)
data_laws.head(6)

Unnamed: 0,State,Marijuana laws status
0,Connecticut,Medical marijuana legalized
1,Nebraska,No laws legalizing marijuana
2,Illinois,Medical marijuana legalized
3,Nevada,Marijuana legalized for recreational use
4,Ohio,No laws legalizing marijuana
5,West Virginia,No laws legalizing marijuana


Column with only integers<br>
0 - No laws legalizing marijuana<br>
1 - Medical marijuana legalized<br>
2 - Marijuana legalized for recreational use<br>
Uncomment row below to get integer values, but for now we will work with string

In [51]:
#data_laws.replace({"No laws legalizing marijuana": 0, "Medical marijuana legalized": 1, "Marijuana legalized for recreational use": 2}, inplace=True)

In [54]:
#data_laws.loc[data_laws['State'] == "Colorado"]
data_laws

Unnamed: 0,State,Marijuana laws status
44,Colorado,Marijuana legalized for recreational use


Now that we know the laws of each state lets move on to marijuana usage.

## Part 2

We will have 3 different datasets that contain info about cannabis usage in states, regions and whole country of USA.<br>
We are going to analyze the data in them to see how much legalization effects percentage of people who smoke cannabis, how it depends on age, how much different is the result comparing to other bad habitats and so on.
All of the datasets below use the data given by NSDUH(National Survey on Drug Use and Health) so there is no need to compare them.

In [121]:
data_regions = pd.read_csv("marijuana-use-2016.csv")
data_regions.head(6)

Unnamed: 0,Region,Year,Age Range,Marijuana Use,Measure Type,Variable,Value
0,Connecticut,2004-2006,12-17,First Use of Marijuana,Percent,Margins of Error,0.91
1,Connecticut,2004-2006,12-17,First Use of Marijuana,Percent,Marijuana Use,7.62
2,Connecticut,2004-2006,12-17,Marijuana Use in the Past Month,Percent,Margins of Error,1.38
3,Connecticut,2004-2006,12-17,Marijuana Use in the Past Month,Percent,Marijuana Use,8.39
4,Connecticut,2004-2006,12-17,Marijuana Use in the Past Year,Percent,Margins of Error,1.88


This dataset contains info on marijuana usage in diiferent regions of America and is devided by years and age group.<br>
As we can see every row is duplicated twice except of two last columns that differ<br>
We will join them to improve quality of our dataset and then move on for now.

In [122]:
data_regions["Margins of Error"] = np.nan
data_regions["Marijuana Usege (times per last year)"] = np.nan
for i in range(int(len(data_regions) / 2)):
    to_merge = data_regions.loc[(data_regions['Region'] == data_regions['Region'].iloc[i]) & 
                     (data_regions['Year'] == data_regions['Year'].iloc[i]) & 
                     (data_regions['Age Range'] == data_regions['Age Range'].iloc[i]) & 
                     (data_regions['Marijuana Use'] == data_regions['Marijuana Use'].iloc[i])]
    data_regions["Margins of Error"].iloc[i] = to_merge.loc[data_regions["Variable"] == "Margins of Error"].iloc[0]["Value"]
    data_regions["Marijuana Usege (times per last year)"].iloc[i] = to_merge.loc[data_regions["Variable"] == "Marijuana Use"].iloc[0]["Value"]
    #data_regions["Margins of Error"].iloc[i].fillna(to_merge.loc[data_regions["Variable"] == "Margins of Error"].iloc[0]["Value"], inplace=True)
    #data_regions["Marijuana Usege (times per last year)"].iloc[i].fillna(to_merge.loc[data_regions["Variable"] == "Marijuana Use"]["Value"], inplace=True)
    data_regions = data_regions.drop(data_regions.index[i + 1])
    
del data_regions["Measure Type"]
del data_regions["Variable"]
del data_regions["Value"]

A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
  self._setitem_with_indexer(indexer, value)


Unnamed: 0,Region,Year,Age Range,Marijuana Use,Margins of Error,Marijuana Usege (times per last year)
0,Connecticut,2004-2006,12-17,First Use of Marijuana,0.91,7.62
2,Connecticut,2004-2006,12-17,Marijuana Use in the Past Month,1.38,8.39
4,Connecticut,2004-2006,12-17,Marijuana Use in the Past Year,1.88,17.79
6,Connecticut,2004-2006,18-25,First Use of Marijuana,1.49,8.33
8,Connecticut,2004-2006,18-25,Marijuana Use in the Past Month,2.62,23.34
10,Connecticut,2004-2006,18-25,Marijuana Use in the Past Year,3.08,37.72
12,Connecticut,2004-2006,Over 17,First Use of Marijuana,0.18,1.07
14,Connecticut,2004-2006,Over 17,Marijuana Use in the Past Month,1.00,7.22
16,Connecticut,2004-2006,Over 17,Marijuana Use in the Past Year,1.36,12.59
18,Connecticut,2004-2006,Over 25,First Use of Marijuana,0.07,0.16


In [123]:
data_regions.head(6)

Unnamed: 0,Region,Year,Age Range,Marijuana Use,Margins of Error,Marijuana Usege (times per last year)
0,Connecticut,2004-2006,12-17,First Use of Marijuana,0.91,7.62
2,Connecticut,2004-2006,12-17,Marijuana Use in the Past Month,1.38,8.39
4,Connecticut,2004-2006,12-17,Marijuana Use in the Past Year,1.88,17.79
6,Connecticut,2004-2006,18-25,First Use of Marijuana,1.49,8.33
8,Connecticut,2004-2006,18-25,Marijuana Use in the Past Month,2.62,23.34
10,Connecticut,2004-2006,18-25,Marijuana Use in the Past Year,3.08,37.72


Our next dataset is actually a set of smaller datasets<br>
They contain number(in thousands) of uses of particular type(every type in another dataset, for example dataset of `First Use of Marijuana, by Age Group and State: Average Annual Number of Marijuana Initiates (Expressed as Numbers in Thousands of the At-Risk Population), Based on 2015 and 2016 NSDUHs`)<br>
Theu use have 95 percent confidence intervals with small margins of error so we will drop the 95% Confidence Interval Lower and Upper.
Also there  is documantation on a dataset which you can read in first few rows of it but will skip for our dataframe.

In [136]:
data2 = pd.read_csv("NSDUH_Totals2016/NSDUHsaeTotals-Tab02-2016.csv")
data2.columns = data2.iloc[4]
data2 = data2[5:]
data2.head(6)

4,Order,State,12 or Older Estimate,12 or Older 95% CI (Lower),12 or Older 95% CI (Upper),12-17 Estimate,12-17 95% CI (Lower),12-17 95% CI (Upper),18-25 Estimate,18-25 95% CI (Lower),18-25 95% CI (Upper),26 or Older Estimate,26 or Older 95% CI (Lower),26 or Older 95% CI (Upper),18 or Older Estimate,18 or Older 95% CI (Lower),18 or Older 95% CI (Upper)
5,1,Total U.S.,36806,36047,37580,3060,2951,3171,11323,11077,11573,22424,21784,23079,33747,32998,34510
6,2,Northeast,7094,6802,7400,509,475,545,2279,2192,2366,4306,4040,4588,6585,6296,6886
7,3,Midwest,7405,7115,7703,676,638,717,2342,2265,2421,4386,4122,4665,6729,6443,7023
8,4,South,11860,11417,12317,1070,1014,1128,3824,3706,3944,6967,6559,7397,10791,10354,11243
9,5,West,10447,10040,10867,804,753,859,2878,2769,2990,6764,6388,7158,9642,9240,10059
10,6,Alabama,386,335,444,38,31,47,135,118,154,212,171,262,348,299,404


In [132]:
to_del = [col for col in data2.columns if ("(Lower)" in col or "(Upper)" in col)]
to_del.append("Order")
for i in to_del:
    del data2[i]

In [133]:
data2

4,State,12 or Older Estimate,12-17 Estimate,18-25 Estimate,26 or Older Estimate,18 or Older Estimate
5,Total U.S.,36806,3060,11323,22424,33747
6,Northeast,7094,509,2279,4306,6585
7,Midwest,7405,676,2342,4386,6729
8,South,11860,1070,3824,6967,10791
9,West,10447,804,2878,6764,9642
10,Alabama,386,38,135,212,348
11,Alaska,134,11,34,89,123
12,Arizona,696,65,214,417,631
13,Arkansas,274,25,83,166,249
14,California,5296,402,1495,3399,4894


In [62]:
data3 = pd.read_csv("NSDUH_Totals2016/NSDUHsaeTotals-Tab02-2016.csv")
data3.columns = data3.iloc[4]
data3 = data3[5:]
data3.head(5)

4,Order,State,12 or Older Estimate,12 or Older 95% CI (Lower),12 or Older 95% CI (Upper),12-17 Estimate,12-17 95% CI (Lower),12-17 95% CI (Upper),18-25 Estimate,18-25 95% CI (Lower),18-25 95% CI (Upper),26 or Older Estimate,26 or Older 95% CI (Lower),26 or Older 95% CI (Upper),18 or Older Estimate,18 or Older 95% CI (Lower),18 or Older 95% CI (Upper)
5,1,Total U.S.,36806,36047,37580,3060,2951,3171,11323,11077,11573,22424,21784,23079,33747,32998,34510
6,2,Northeast,7094,6802,7400,509,475,545,2279,2192,2366,4306,4040,4588,6585,6296,6886
7,3,Midwest,7405,7115,7703,676,638,717,2342,2265,2421,4386,4122,4665,6729,6443,7023
8,4,South,11860,11417,12317,1070,1014,1128,3824,3706,3944,6967,6559,7397,10791,10354,11243
9,5,West,10447,10040,10867,804,753,859,2878,2769,2990,6764,6388,7158,9642,9240,10059


In [63]:
data4 = pd.read_csv("NSDUH_Totals2016/NSDUHsaeTotals-Tab02-2016.csv")
data4.columns = data4.iloc[4]
data4 = data4[5:]
data4.head(5)

4,Order,State,12 or Older Estimate,12 or Older 95% CI (Lower),12 or Older 95% CI (Upper),12-17 Estimate,12-17 95% CI (Lower),12-17 95% CI (Upper),18-25 Estimate,18-25 95% CI (Lower),18-25 95% CI (Upper),26 or Older Estimate,26 or Older 95% CI (Lower),26 or Older 95% CI (Upper),18 or Older Estimate,18 or Older 95% CI (Lower),18 or Older 95% CI (Upper)
5,1,Total U.S.,36806,36047,37580,3060,2951,3171,11323,11077,11573,22424,21784,23079,33747,32998,34510
6,2,Northeast,7094,6802,7400,509,475,545,2279,2192,2366,4306,4040,4588,6585,6296,6886
7,3,Midwest,7405,7115,7703,676,638,717,2342,2265,2421,4386,4122,4665,6729,6443,7023
8,4,South,11860,11417,12317,1070,1014,1128,3824,3706,3944,6967,6559,7397,10791,10354,11243
9,5,West,10447,10040,10867,804,753,859,2878,2769,2990,6764,6388,7158,9642,9240,10059


In [61]:
data5 = pd.read_csv("NSDUH_Totals2016/NSDUHsaeTotals-Tab05-2016.csv")
data5.columns = data5.iloc[5]
data5 = data5[6:]
data5.head(5)

5,Order,State,12 or Older Estimate,12 or Older 95% CI (Lower),12 or Older 95% CI (Upper),12-17 Estimate,12-17 95% CI (Lower),12-17 95% CI (Upper),18-25 Estimate,18-25 95% CI (Lower),18-25 95% CI (Upper),26 or Older Estimate,26 or Older 95% CI (Lower),26 or Older 95% CI (Upper),18 or Older Estimate,18 or Older 95% CI (Lower),18 or Older 95% CI (Upper)
6,1,Total U.S.,3002,2894,3113,1169,1122,1219,1392,1322,1466,440,387,502,1833,1737,1933
7,2,Northeast,543,507,581,197,182,213,261,239,285,85,68,105,346,316,379
8,3,Midwest,648,611,688,251,236,268,311,288,336,86,70,105,397,366,431
9,4,South,1037,978,1099,426,402,452,479,445,516,131,107,160,610,562,662
10,5,West,774,723,830,295,273,319,340,310,373,139,113,170,479,436,527


This dataset has data based on all US states but for all ages<br>
*-use - percentage of those in an age group who used * in the past 12 months
*-frequency - median number of times a user in an age group used * in the past 12 months

In [64]:
data_all_uses = pd.read_csv("drug-use-by-age.csv")#Bad habitats by age 2015
data_all_uses

Unnamed: 0,age,n,alcohol-use,alcohol-frequency,marijuana-use,marijuana-frequency,cocaine-use,cocaine-frequency,crack-use,crack-frequency,...,oxycontin-use,oxycontin-frequency,tranquilizer-use,tranquilizer-frequency,stimulant-use,stimulant-frequency,meth-use,meth-frequency,sedative-use,sedative-frequency
0,12,2798,3.9,3.0,1.1,4.0,0.1,5.0,0.0,-,...,0.1,24.5,0.2,52.0,0.2,2.0,0.0,-,0.2,13.0
1,13,2757,8.5,6.0,3.4,15.0,0.1,1.0,0.0,3.0,...,0.1,41.0,0.3,25.5,0.3,4.0,0.1,5.0,0.1,19.0
2,14,2792,18.1,5.0,8.7,24.0,0.1,5.5,0.0,-,...,0.4,4.5,0.9,5.0,0.8,12.0,0.1,24.0,0.2,16.5
3,15,2956,29.2,6.0,14.5,25.0,0.5,4.0,0.1,9.5,...,0.8,3.0,2.0,4.5,1.5,6.0,0.3,10.5,0.4,30.0
4,16,3058,40.1,10.0,22.5,30.0,1.0,7.0,0.0,1.0,...,1.1,4.0,2.4,11.0,1.8,9.5,0.3,36.0,0.2,3.0
5,17,3038,49.3,13.0,28.0,36.0,2.0,5.0,0.1,21.0,...,1.4,6.0,3.5,7.0,2.8,9.0,0.6,48.0,0.5,6.5
6,18,2469,58.7,24.0,33.7,52.0,3.2,5.0,0.4,10.0,...,1.7,7.0,4.9,12.0,3.0,8.0,0.5,12.0,0.4,10.0
7,19,2223,64.6,36.0,33.4,60.0,4.1,5.5,0.5,2.0,...,1.5,7.5,4.2,4.5,3.3,6.0,0.4,105.0,0.3,6.0
8,20,2271,69.7,48.0,34.0,60.0,4.9,8.0,0.6,5.0,...,1.7,12.0,5.4,10.0,4.0,12.0,0.9,12.0,0.5,4.0
9,21,2354,83.2,52.0,33.0,52.0,4.8,5.0,0.5,17.0,...,1.3,13.5,3.9,7.0,4.1,10.0,0.6,2.0,0.3,9.0


Now let's vizualize some of the date we've got now and than analyze it.

# *VISUALIZATION WILL BE HERE*

We can also compare the usage of cannabis in US with the rest of the world by using following datasets<br>
Column `VALUE` tells us the percentage out of 100% of this particular age, gender and country who[]
Column `SIGNIF_GENDER` tells us whether difference in value is significant or not

In [282]:
data_eu1 = pd.read_csv("european_countries/HBSC_26_EN.csv")#15y.o., 2014
data_eu1 = data_eu1[:-5]

In [283]:
#data_eu1

HBSC Members are all the countries provided in the dataset.<br>
Column `COUNTRY_GRP`(Country Group) is almost empty except last 3 entries so we will drop it and replace values of column `COUNTRY` for those 3 rows to HBSC_MEMBER

In [284]:
data_eu1["COUNTRY"][-3:].replace(np.nan, "HBSC_MEMBER", inplace=True)
del data_eu1["COUNTRY_GRP"]
data_eu1 = data_eu1[:-1]

In [285]:
print(data_eu1["AGE_GRP_2"].unique())
print(data_eu1["YEAR"].unique())
del data_eu1["AGE_GRP_2"]
del data_eu1["YEAR"]

['15YO']
[2014.]


Since columns `AGE_GRP_2` and `YEAR` have always the same value we will drop them<br>
We will also put gender as a column and it's percentage as value of this column

In [286]:
data_eu1["FEMALE_VALUE"] = np.nan
data_eu1["MALE_VALUE"] = np.nan
for i in range(int(len(data_eu1) / 2)):
    to_merge = data_eu1.loc[(data_eu1['COUNTRY'] == data_eu1['COUNTRY'].iloc[i])]
    data_eu1["FEMALE_VALUE"].iloc[i+1] = to_merge.loc[data_eu1["SEX"] == "FEMALE"].iloc[0]["VALUE"]
    data_eu1["MALE_VALUE"].iloc[i+1] = to_merge.loc[data_eu1["SEX"] == "MALE"].iloc[0]["VALUE"]
    data_eu1 = data_eu1.drop(data_eu1.index[i])
    
del data_eu1["SEX"]
del data_eu1["VALUE"]
data_eu1["SIGNIF_GENDER"].iloc[-1] = "SIGNIF"

A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
  self._setitem_with_indexer(indexer, value)


In [293]:
data_eu1

Unnamed: 0,COUNTRY,SIGNIF_GENDER,FEMALE_VALUE,MALE_VALUE
1,UKR,SIGNIF,1.0,3.0
3,SWE,SIGNIF,1.0,2.0
5,SVN,NOTSIGNIF,2.0,3.0
7,RUS,NOTSIGNIF,4.0,3.0
9,ROU,SIGNIF,1.0,4.0
11,PRT,NOTSIGNIF,2.0,4.0
13,POL,NOTSIGNIF,4.0,5.0
15,NLD,NOTSIGNIF,3.0,4.0
17,MLT,NOTSIGNIF,3.0,3.0
19,MDA,SIGNIF,0.0,1.0


## Part 3

Support + Related crime
Economics

In [301]:
dataQ = pd.read_csv("marijuana_gross_sales(1).csv")#Permitted Medical Cannabis Dispensaries 
dataQ

Unnamed: 0,YEAR,MONTH,GROSS_SALES_TYPE,GROSS_SALES
0,2017,SEPTEMBER,Medical Total Gross Sales,18314027.40
1,2017,SEPTEMBER,Retail Total Gross Sales,34950895.10
2,2017,AUGUST,Medical Total Gross Sales,19043315.07
3,2017,AUGUST,Retail Total Gross Sales,35240979.02
4,2017,JULY,Retail Total Gross Sales,34815762.24
5,2017,JULY,Medical Total Gross Sales,17922958.90
6,2017,JUNE,Medical Total Gross Sales,17615041.10
7,2017,JUNE,Retail Total Gross Sales,31908811.19
8,2017,MAY,Retail Total Gross Sales,30671594.41
9,2017,MAY,Medical Total Gross Sales,18276904.11


In [305]:
dataW = pd.read_csv("marijuana_sales_tax_2015.csv", header=1)
dataW

Unnamed: 0,2015,Retail Marijuana Combined Sales Tax Total (Includes Standard Sales Tax Rate and Special 3.5% Retail Sales Tax Rate),Special 3.5% Special Marijuana Retail Sales Tax Rate ONLY,Medical MJ Sales Tax Total
0,January,"$1,094,978.00","$536,003.22","$510,132.00"
1,February,"$1,020,184.00","$499,390.77","$491,069.00"
2,March,"$1,207,890.00","$591,274.83","$544,085.00"
3,April,"$1,374,986.00","$673,070.07","$568,703.00"
4,May,"$1,310,181.00","$641,347.34","$558,107.00"
5,June,"$1,274,621.00","$623,940.35","$579,968.00"
6,July,"$1,418,052.00","$694,151.33","$733,949.00"
7,August,"$1,478,473.00","$723,728.04","$694,621.00"
8,September,"$1,484,782.00","$726,816.36","$667,360.00"
9,October,"$1,365,683.00","$668,516.15","$624,931.00"


In [312]:
dataE = pd.read_csv("tax_revenue2018.csv")
dataE

Unnamed: 0,MONTH,Retail Gross Sales,Retail Special,Retail Sales Tax,Retail Sales Tax Total,Last Year Retail Sales Tax Total,YoY % Retail Sales Tax Total,Medical Gross Sales,Medical Sales Tax,Last year Medical Sales Tax,YoY % Medical Sales Tax,Retail & Medical Sales Tax Total,Last Year Retail & Medical Sales Tax,YoY % Retail & Medical Sales Tax
0,January,29.455.423,1.030.945,1.075.117,2.106.062,1.841.552,14.4,14.070.685,513.580,564.015,-8.9,2.619.642,2.405.567,8.9
1,February,27.832.113,974.130,1.015.868,1.989.998,1.815.650,9.6,12.025.370,438.926,524.892,-164.0,2.428.924,2.340.542,3.8
2,March,34.738.044,1.215.828,1.267.947,2.483.775,2.199.043,12.9,13.687.233,499.584,656.681,-23.9,2.983.359,2.855.724,4.5
3,April,33.977.468,1.189.209,1.240.183,2.429.392,2.166.359,12.1,12.842.356,468.746,692.897,-32.3,2.898.138,2.859.256,1.4
4,May,31.713.084,1.109.954,1.157.530,2.267.484,2.189.671,3.6,12.482.411,455.608,670.549,-32.1,2.723.092,2.860.220,-4.8
5,June,32.757.602,1.146.522,1.195.649,2.342.171,2.276.298,2.9,12.220.466,446.047,648.705,-31.2,2.788.218,2.925.003,-4.7
6,July,34.398.616,1.203.948,1.255.549,2.459.497,2.467.141,-0.3,12.471.260,455.201,654.490,-30.4,2.914.698,3.121.631,-6.6
7,August,35.612.280,1.246.427,1.299.852,2.546.279,2.519.730,1.1,13.767.123,502.500,696.146,-27.8,3.048.779,3.215.876,-5.2
8,September,34.210.068,1.197.342,1.248.680,2.446.022,2.519.035,-2.9,13.735.260,501.337,662.939,-24.4,2.947.359,3.181.974,-7.4
9,October,32.307.161,1.776.895,1.179.217,2.956.112,2.419.056,22.2,12.981.973,473.842,633.462,-25.2,3.429.954,3.052.518,12.4


In [31]:
#data_git = pd.read_csv("cannabis-dataset-git/Dataset/Products/products-kushy_api.2017-11-14.csv")
data_git = pd.read_csv("cannabis-dataset-git/Dataset/Shops/shops-kushy_api.2017-11-14.csv")
data_git.head(1)

Unnamed: 0,id,status,sort,name,slug,featured_image,avatar,images,gallery,description,...,coupons,deals,rating,tags,twitter,facebook,instagram,tumblr,googleplus,type
0,1,2,0,Wellness Earth Energy Dispensary,wellness-earth-energy-dispensary/,http://kushy.net/wp-content/uploads/2016/10/th...,http://weedporndaily.com/delivery/wp-content/u...,,,,...,,,,,,,,,,


In [5]:
data5 = pd.read_csv("rows.csv")#Permitted Medical Cannabis Dispensaries 
data5

Unnamed: 0,Strain,Type,Rating,Effects,Flavor,Description
0,100-Og,hybrid,4.0,"Creative,Energetic,Tingly,Euphoric,Relaxed","Earthy,Sweet,Citrus",$100 OG is a 50/50 hybrid strain that packs a ...
1,98-White-Widow,hybrid,4.7,"Relaxed,Aroused,Creative,Happy,Energetic","Flowery,Violet,Diesel",The ‘98 Aloha White Widow is an especially pot...
2,1024,sativa,4.4,"Uplifted,Happy,Relaxed,Energetic,Creative","Spicy/Herbal,Sage,Woody",1024 is a sativa-dominant hybrid bred in Spain...
3,13-Dawgs,hybrid,4.2,"Tingly,Creative,Hungry,Relaxed,Uplifted","Apricot,Citrus,Grapefruit",13 Dawgs is a hybrid of G13 and Chemdawg genet...
4,24K-Gold,hybrid,4.6,"Happy,Relaxed,Euphoric,Uplifted,Talkative","Citrus,Earthy,Orange","Also known as Kosher Tangie, 24k Gold is a 60%..."
5,3-Bears-Og,indica,0.0,,,3 Bears OG by Mephisto Genetics is an autoflow...
6,3-Kings,hybrid,4.4,"Relaxed,Euphoric,Happy,Uplifted,Hungry","Earthy,Sweet,Pungent","The 3 Kings marijuana strain, a holy trinity o..."
7,303-Og,indica,4.2,"Relaxed,Happy,Euphoric,Uplifted,Giggly","Citrus,Pungent,Earthy",The indica-dominant 303 OG is a Colorado strai...
8,3D-Cbd,sativa,4.6,"Uplifted,Focused,Happy,Talkative,Relaxed","Earthy,Woody,Flowery",3D CBD from Snoop Dogg’s branded line of canna...
9,3X-Crazy,indica,4.4,"Relaxed,Tingly,Happy,Euphoric,Uplifted","Earthy,Grape,Sweet","Also known as Optimus Prime, the indica-domina..."


In [None]:
data5 = pd.read_csv("cannabis.csv")#Cannabis Interactive Analysis with NLP
data5