![](https://news.microsoft.com/uploads/sites/71/2018/03/logo-960x377.jpg)
#### Introduction
The malware industry continues to be a well-organized, well-funded market dedicated to evading traditional security measures. Once a computer is infected by malware, criminals can hurt consumers and enterprises in many ways.

With more than one billion enterprise and consumer customers, Microsoft takes this problem very seriously and is deeply invested in improving security.

As one part of their overall strategy for doing so, Microsoft is challenging the data science community to develop techniques to predict if a machine will soon be hit with malware. As with their previous, Malware Challenge (2015), Microsoft is providing Kagglers with an unprecedented malware dataset to encourage open-source progress on effective techniques for predicting malware occurrences.

Can you help protect more than one billion machines from damage BEFORE it happens?

#### Acknowledgements
This competition is hosted by Microsoft, Windows Defender ATP Research, Northeastern University College of Computer and Information Science, and Georgia Tech Institute for Information Security & Privacy.

#### Evaluation

Submissions are evaluated on area under the ROC curve between the predicted probability and the observed label. For each MachineIdentifier in the test set, you must predict a probability for the HasDetections column. The file should contain a header and have the following format:
```
MachineIdentifier,HasDetections
1,0.5
6,0.5
14,0.5
etc.
```


Visualization Code in [Home Credit : Complete EDA + Feature Importance ✓✓
](https://www.kaggle.com/codename007/home-credit-complete-eda-feature-importance)
Thanks Lathwal

In [None]:
import pandas as pd #Analysis 
import matplotlib.pyplot as plt #Visulization
import seaborn as sns #Visulization
import numpy as np #Analysis 
from scipy.stats import norm #Analysis 
from sklearn.preprocessing import StandardScaler #Analysis 
from scipy import stats #Analysis 
import warnings 
warnings.filterwarnings('ignore')
%matplotlib inline
import gc

import os
import string
color = sns.color_palette()

%matplotlib inline

from plotly import tools
import plotly.offline as py
py.init_notebook_mode(connected=True)
import plotly.graph_objs as go

from sklearn import model_selection, preprocessing, metrics, ensemble, naive_bayes, linear_model
from sklearn.feature_extraction.text import TfidfVectorizer, CountVectorizer
from sklearn.decomposition import TruncatedSVD
import lightgbm as lgb

pd.options.mode.chained_assignment = None
pd.options.display.max_columns = 999

import time

### Data Load
it's code in [Load the Totality of the Data](https://www.kaggle.com/theoviel/load-the-totality-of-the-data)
Thanks.Theo Viel 


Set a dtypes in advanced

In [None]:
dtypes = {
        'MachineIdentifier':                                    'object',
        'ProductName':                                          'object',
        'EngineVersion':                                        'object',
        'AppVersion':                                           'object',
        'AvSigVersion':                                         'object',
        'IsBeta':                                               'int8',
        'RtpStateBitfield':                                     'float16',
        'IsSxsPassiveMode':                                     'int8',
        'DefaultBrowsersIdentifier':                            'float16',
        'AVProductStatesIdentifier':                            'float32',
        'AVProductsInstalled':                                  'float16',
        'AVProductsEnabled':                                    'float16',
        'HasTpm':                                               'int8',
        'CountryIdentifier':                                    'int16',
        'CityIdentifier':                                       'float32',
        'OrganizationIdentifier':                               'float16',
        'GeoNameIdentifier':                                    'float16',
        'LocaleEnglishNameIdentifier':                          'int8',
        'Platform':                                             'object',
        'Processor':                                            'object',
        'OsVer':                                                'object',
        'OsBuild':                                              'int16',
        'OsSuite':                                              'int16',
        'OsPlatformSubRelease':                                 'object',
        'OsBuildLab':                                           'object',
        'SkuEdition':                                           'object',
        'IsProtected':                                          'float16',
        'AutoSampleOptIn':                                      'int8',
        'PuaMode':                                              'object',
        'SMode':                                                'float16',
        'IeVerIdentifier':                                      'float16',
        'SmartScreen':                                          'object',
        'Firewall':                                             'float16',
        'UacLuaenable':                                         'float32',
        'Census_MDC2FormFactor':                                'object',
        'Census_DeviceFamily':                                  'object',
        'Census_OEMNameIdentifier':                             'float16',
        'Census_OEMModelIdentifier':                            'float32',
        'Census_ProcessorCoreCount':                            'float16',
        'Census_ProcessorManufacturerIdentifier':               'float16',
        'Census_ProcessorModelIdentifier':                      'float16',
        'Census_ProcessorClass':                                'object',
        'Census_PrimaryDiskTotalCapacity':                      'float32',
        'Census_PrimaryDiskTypeName':                           'object',
        'Census_SystemVolumeTotalCapacity':                     'float32',
        'Census_HasOpticalDiskDrive':                           'int8',
        'Census_TotalPhysicalRAM':                              'float32',
        'Census_ChassisTypeName':                               'object',
        'Census_InternalPrimaryDiagonalDisplaySizeInInches':    'float16',
        'Census_InternalPrimaryDisplayResolutionHorizontal':    'float16',
        'Census_InternalPrimaryDisplayResolutionVertical':      'float16',
        'Census_PowerPlatformRoleName':                         'object',
        'Census_InternalBatteryType':                           'object',
        'Census_InternalBatteryNumberOfCharges':                'float32',
        'Census_OSVersion':                                     'object',
        'Census_OSArchitecture':                                'object',
        'Census_OSBranch':                                      'object',
        'Census_OSBuildNumber':                                 'int16',
        'Census_OSBuildRevision':                               'int32',
        'Census_OSEdition':                                     'object',
        'Census_OSSkuName':                                     'object',
        'Census_OSInstallTypeName':                             'object',
        'Census_OSInstallLanguageIdentifier':                   'float16',
        'Census_OSUILocaleIdentifier':                          'int16',
        'Census_OSWUAutoUpdateOptionsName':                     'object',
        'Census_IsPortableOperatingSystem':                     'int8',
        'Census_GenuineStateName':                              'object',
        'Census_ActivationChannel':                             'object',
        'Census_IsFlightingInternal':                           'float16',
        'Census_IsFlightsDisabled':                             'float16',
        'Census_FlightRing':                                    'object',
        'Census_ThresholdOptIn':                                'float16',
        'Census_FirmwareManufacturerIdentifier':                'float16',
        'Census_FirmwareVersionIdentifier':                     'float32',
        'Census_IsSecureBootEnabled':                           'int8',
        'Census_IsWIMBootEnabled':                              'float16',
        'Census_IsVirtualDevice':                               'float16',
        'Census_IsTouchEnabled':                                'int8',
        'Census_IsPenCapable':                                  'int8',
        'Census_IsAlwaysOnAlwaysConnectedCapable':              'float16',
        'Wdft_IsGamer':                                         'float16',
        'Wdft_RegionIdentifier':                                'float16',
        'HasDetections':                                        'int8'
        }

In [None]:
%time train = pd.read_csv("../input/train.csv", dtype=dtypes)
%time test = pd.read_csv("../input/test.csv", dtype=dtypes)

In [None]:
def reduce_mem_usage(df, verbose=True):
    numerics = ['int16', 'int32', 'int64', 'float16', 'float32', 'float64']
    start_mem = df.memory_usage().sum() / 1024**2    
    for col in df.columns:
        col_type = df[col].dtypes
        if col_type in numerics:
            c_min = df[col].min()
            c_max = df[col].max()
            if str(col_type)[:3] == 'int':
                if c_min > np.iinfo(np.int8).min and c_max < np.iinfo(np.int8).max:
                    df[col] = df[col].astype(np.int8)
                elif c_min > np.iinfo(np.int16).min and c_max < np.iinfo(np.int16).max:
                    df[col] = df[col].astype(np.int16)
                elif c_min > np.iinfo(np.int32).min and c_max < np.iinfo(np.int32).max:
                    df[col] = df[col].astype(np.int32)
                elif c_min > np.iinfo(np.int64).min and c_max < np.iinfo(np.int64).max:
                    df[col] = df[col].astype(np.int64)  
            else:
                if c_min > np.finfo(np.float16).min and c_max < np.finfo(np.float16).max:
                    df[col] = df[col].astype(np.float16)
                elif c_min > np.finfo(np.float32).min and c_max < np.finfo(np.float32).max:
                    df[col] = df[col].astype(np.float32)
                else:
                    df[col] = df[col].astype(np.float64)    
    end_mem = df.memory_usage().sum() / 1024**2
    if verbose: print('Mem. usage decreased to {:5.2f} Mb ({:.1f}% reduction)'.format(end_mem, 100 * (start_mem - end_mem) / start_mem))
    return df

In [None]:
train = reduce_mem_usage(train)
test = reduce_mem_usage(test)

### Data Exploration

In [None]:
train.head()

In [None]:
train.columns

#### Columns 
Unavailable or self-documenting column names are marked with an "NA".

- MachineIdentifier - Individual machine ID
- ProductName - Defender state information e.g. win8defender
- EngineVersion - Defender state information e.g. 1.1.12603.0
- AppVersion - Defender state information e.g. 4.9.10586.0
- AvSigVersion - Defender state information e.g. 1.217.1014.0
- IsBeta - Defender state information e.g. false
- RtpStateBitfield - NA
- IsSxsPassiveMode - NA
- DefaultBrowsersIdentifier - ID for the machine's default browser
- AVProductStatesIdentifier - ID for the specific configuration of a user's antivirus software
- AVProductsInstalled - NA
- AVProductsEnabled - NA
- HasTpm - True if machine has tpm
- CountryIdentifier - ID for the country the machine is located in
- CityIdentifier - ID for the city the machine is located in
- OrganizationIdentifier - ID for the organization the machine belongs in, organization ID is mapped to both - specific companies and broad industries
- GeoNameIdentifier - ID for the geographic region a machine is located in
- LocaleEnglishNameIdentifier - English name of Locale ID of the current user
- Platform - Calculates platform name (of OS related properties and processor property)
- Processor - This is the process architecture of the installed operating system
- OsVer - Version of the current operating system
- OsBuild - Build of the current operating system
- OsSuite - Product suite mask for the current operating system.
- OsPlatformSubRelease - Returns the OS Platform sub-release (Windows Vista, Windows 7, Windows 8, TH1, TH2)
- OsBuildLab - Build lab that generated the current OS. Example: 9600.17630.amd64fre.winblue_r7.150109-2022
- SkuEdition - The goal of this feature is to use the Product Type defined in the MSDN to map to a 'SKU-Edition' - name that is useful in population reporting. The valid Product Type are defined in %sdxroot%\data\windowseditions.xml. This API has been used since Vista and Server 2008, so there are many Product Types that do not apply to Windows 10. The 'SKU-Edition' is a string value that is in one of three classes of results. The design must hand each class.
- IsProtected - This is a calculated field derived from the Spynet Report's AV Products field. Returns: a. TRUE if there is at least one active and up-to-date antivirus product running on this machine. b. FALSE if there is no active AV product on this machine, or if the AV is active, but is not receiving the latest updates. c. null if there are no Anti Virus Products in the report. Returns: Whether a machine is protected.
- AutoSampleOptIn - This is the SubmitSamplesConsent value passed in from the service, available on CAMP 9+
- PuaMode - Pua Enabled mode from the service
- SMode - This field is set to true when the device is known to be in 'S Mode', as in, Windows 10 S mode, where only Microsoft Store apps can be installed
- eVerIdentifier - NA
- SmartScreen - This is the SmartScreen enabled string value from registry. This is obtained by checking in order, HKLM\SOFTWARE\Policies\Microsoft\Windows\System\SmartScreenEnabled and HKLM\SOFTWARE\Microsoft\Windows\CurrentVersion\Explorer\SmartScreenEnabled. If the value exists but is blank, the value "ExistsNotSet" is sent in telemetry.
- Firewall - This attribute is true (1) for Windows 8.1 and above if windows firewall is enabled, as reported by the service.
- UacLuaenable - This attribute reports whether or not the "administrator in Admin Approval Mode" user type is disabled or enabled in UAC. The value reported is obtained by reading the regkey HKLM\SOFTWARE\Microsoft\Windows\CurrentVersion\Policies\System\EnableLUA.
- Census_MDC2FormFactor - A grouping based on a combination of Device Census level hardware characteristics. The logic used to define Form Factor is rooted in business and industry standards and aligns with how people think about their device. (Examples: Smartphone, Small Tablet, All in One, Convertible...)
- Census_DeviceFamily - AKA DeviceClass. Indicates the type of device that an edition of the OS is intended for. Example values: Windows.Desktop, Windows.Mobile, and iOS.Phone
- Census_OEMNameIdentifier - NA
- Census_OEMModelIdentifier - NA
- Census_ProcessorCoreCount - Number of logical cores in the processor
- Census_ProcessorManufacturerIdentifier - NA
- Census_ProcessorModelIdentifier - NA
- Census_ProcessorClass - A classification of processors into high/medium/low. Initially used for Pricing Level SKU. No longer maintained and updated
- Census_PrimaryDiskTotalCapacity - Amount of disk space on primary disk of the machine in MB
- Census_PrimaryDiskTypeName - Friendly name of Primary Disk Type - HDD or SSD
- Census_SystemVolumeTotalCapacity - The size of the partition that the System volume is installed on in MB
- Census_HasOpticalDiskDrive - True indicates that the machine has an optical disk drive (CD/DVD)
- Census_TotalPhysicalRAM - Retrieves the physical RAM in MB
- Census_ChassisTypeName - Retrieves a numeric representation of what type of chassis the machine has. A value of 0 means xx
- Census_InternalPrimaryDiagonalDisplaySizeInInches - Retrieves the physical diagonal length in inches of the primary display
- Census_InternalPrimaryDisplayResolutionHorizontal - Retrieves the number of pixels in the horizontal direction of the internal display.
- Census_InternalPrimaryDisplayResolutionVertical - Retrieves the number of pixels in the vertical direction of the internal display
- Census_PowerPlatformRoleName - Indicates the OEM preferred power management profile. This value helps identify the basic form factor of the device
- Census_InternalBatteryType - NA
- Census_InternalBatteryNumberOfCharges - NA
- Census_OSVersion - Numeric OS version Example - 10.0.10130.0
- Census_OSArchitecture - Architecture on which the OS is based. Derived from OSVersionFull. Example - amd64
- Census_OSBranch - Branch of the OS extracted from the OsVersionFull. Example - OsBranch = fbl_partner_eeap where OsVersion = 6.4.9813.0.amd64fre.fbl_partner_eeap.140810-0005
- Census_OSBuildNumber - OS Build number extracted from the OsVersionFull. Example - OsBuildNumber = 10512 or 10240
- Census_OSBuildRevision - OS Build revision extracted from the OsVersionFull. Example - OsBuildRevision = 1000 or 16458
- Census_OSEdition - Edition of the current OS. Sourced from HKLM\Software\Microsoft\Windows NT\CurrentVersion@EditionID in registry. Example: Enterprise
- Census_OSSkuName - OS edition friendly name (currently Windows only)
- Census_OSInstallTypeName - Friendly description of what install was used on the machine i.e. clean
- Census_OSInstallLanguageIdentifier - NA
- Census_OSUILocaleIdentifier - NA
- Census_OSWUAutoUpdateOptionsName - Friendly name of the WindowsUpdate auto-update settings on the machine.
- Census_IsPortableOperatingSystem - Indicates whether OS is booted up and running via Windows-To-Go on a USB stick.
- Census_GenuineStateName - Friendly name of OSGenuineStateID. 0 = Genuine
- Census_ActivationChannel - Retail license key or Volume license key for a machine.
- Census_IsFlightingInternal - NA
- Census_IsFlightsDisabled - Indicates if the machine is participating in flighting.
- Census_FlightRing - The ring that the device user would like to receive flights for. This might be different from the ring of the OS which is currently installed if the user changes the ring after getting a flight from a different ring.
- Census_ThresholdOptIn - NA
- Census_FirmwareManufacturerIdentifier - NA
- Census_FirmwareVersionIdentifier - NA
- Census_IsSecureBootEnabled - Indicates if Secure Boot mode is enabled.
- Census_IsWIMBootEnabled - NA
- Census_IsVirtualDevice - Identifies a Virtual Machine (machine learning model)
- Census_IsTouchEnabled - Is this a touch device ?
- Census_IsPenCapable - Is the device capable of pen input ?
- Census_IsAlwaysOnAlwaysConnectedCapable - Retreives information about whether the battery enables the device to be AlwaysOnAlwaysConnected .
- Wdft_IsGamer - Indicates whether the device is a gamer device or not based on its hardware combination.
- Wdft_RegionIdentifier - NA

In [None]:
print(train.shape,test.shape)

Size is so big !!

In [None]:
temp = train["HasDetections"].value_counts()
df = pd.DataFrame({'labels': temp.index,
                   'values': temp.values
                  })
plt.figure(figsize = (14,6))
plt.title('HasDetections 0 vs 1')
sns.set_color_codes("pastel")
sns.barplot(x = 'labels', y="values", data=df)
locs, labels = plt.xticks()
plt.show()

They are same value counts. No unbalanced problem.

In [None]:
temp = train["ProductName"].value_counts()
df = pd.DataFrame({'labels': temp.index,
                   'values': temp.values
                  })
plt.figure(figsize = (14,6))
plt.title('Count of ProductName')
sns.set_color_codes("pastel")
sns.barplot(x = 'labels', y="values", data=df)
locs, labels = plt.xticks()
plt.show()

In [None]:
temp

In [None]:
import pandas as pd # package for high-performance, easy-to-use data structures and data analysis
import numpy as np # fundamental package for scientific computing with Python
import matplotlib
import matplotlib.pyplot as plt # for plotting
import seaborn as sns # for making plots with seaborn
color = sns.color_palette()
import plotly.offline as py
py.init_notebook_mode(connected=True)
from plotly.offline import init_notebook_mode, iplot
init_notebook_mode(connected=True)
import plotly.graph_objs as go
import plotly.offline as offline
offline.init_notebook_mode()
# from plotly import tools
# import plotly.tools as tls
# import squarify
# from mpl_toolkits.basemap import Basemap
# from numpy import array
# from matplotlib import cm

# import cufflinks and offline mode
import cufflinks as cf
cf.go_offline()

# from sklearn import preprocessing
# # Supress unnecessary warnings so that presentation looks clean
# import warnings
# warnings.filterwarnings("ignore")

# # Print all rows and columns
# pd.set_option('display.max_columns', None)
# pd.set_option('display.max_rows', None)

In [None]:
temp = train["EngineVersion"].value_counts()
df = pd.DataFrame({'labels': temp.index,
                   'values': temp.values
                  })
df.iplot(kind='pie',labels='labels',values='values', title='Count of Engine Version')

Length is 70

In [None]:
temp = train["AppVersion"].value_counts()
df = pd.DataFrame({'labels': temp.index,
                   'values': temp.values
                  })
df.iplot(kind='pie',labels='labels',values='values', title='Count of App Version')

AvSig Version has many values. Length : 8531
And some values is broken. like `1.2&#x17;3.1144.0`

In [None]:
temp = train["AvSigVersion"].value_counts()
temp

In [None]:
temp = train["IsBeta"].value_counts()
df = pd.DataFrame({'labels': temp.index,
                   'values': temp.values
                  })
plt.figure(figsize = (14,6))
plt.title('IsBeta 0 vs 1')
sns.set_color_codes("pastel")
sns.barplot(x = 'labels', y="values", data=df)
locs, labels = plt.xticks()
plt.show()

Very unbalanced IsBeta

In [None]:
temp = train["RtpStateBitfield"].value_counts()
df = pd.DataFrame({'labels': temp.index,
                   'values': temp.values
                  })
df.iplot(kind='pie',labels='labels',values='values', title='Count of RtpStateBitfield')

In [None]:
temp = train["IsSxsPassiveMode"].value_counts()
df = pd.DataFrame({'labels': temp.index,
                   'values': temp.values
                  })
df.iplot(kind='pie',labels='labels',values='values', title='Count of IsSxsPassiveMode')

In [None]:
temp = train["DefaultBrowsersIdentifier"].value_counts()
temp

In [None]:
temp = train["AVProductStatesIdentifier"].value_counts()
temp

In [None]:
temp = train["AVProductsInstalled"].value_counts()
df = pd.DataFrame({'labels': temp.index,
                   'values': temp.values
                  })
df.iplot(kind='pie',labels='labels',values='values', title='Count of AVProductsInstalled')

In [None]:
temp = train["AVProductsInstalled"].value_counts()
df = pd.DataFrame({'labels': temp.index,
                   'values': temp.values
                  })
df.iplot(kind='pie',labels='labels',values='values', title='Count of AVProductsInstalled')

In [None]:
temp = train["HasTpm"].value_counts()
df = pd.DataFrame({'labels': temp.index,
                   'values': temp.values
                  })
df.iplot(kind='pie',labels='labels',values='values', title='Count of HasTpm')

In [None]:
#histogram
f, ax = plt.subplots(figsize=(14, 6))
sns.distplot(train['CountryIdentifier'])

In [None]:
temp = train["OrganizationIdentifier"].value_counts()
df = pd.DataFrame({'labels': temp.index,
                   'values': temp.values
                  })
df.iplot(kind='pie',labels='labels',values='values', title='Count of OrganizationIdentifier')

In [None]:
temp = train["GeoNameIdentifier"].value_counts()
temp

In [None]:
temp = train["LocaleEnglishNameIdentifier"].value_counts()
temp


In [None]:
temp = train["Platform"].value_counts()
df = pd.DataFrame({'labels': temp.index,
                   'values': temp.values
                  })
df.iplot(kind='pie',labels='labels',values='values', title='Count of Platform')

In [None]:
temp = train["Processor"].value_counts()

df = pd.DataFrame({'labels': temp.index,
                   'values': temp.values
                  })
df.iplot(kind='pie',labels='labels',values='values', title='Count of Processor')

In [None]:
temp = train["OsVer"].value_counts()

df = pd.DataFrame({'labels': temp.index,
                   'values': temp.values
                  })
df.iplot(kind='pie',labels='labels',values='values', title='Count of OsVer')

In [None]:
temp = train["OsBuild"].value_counts()
temp

In [None]:
temp = train["OsSuite"].value_counts()

df = pd.DataFrame({'labels': temp.index,
                   'values': temp.values
                  })
df.iplot(kind='pie',labels='labels',values='values', title='Count of OsSuite')

In [None]:
temp = train['OsPlatformSubRelease'].value_counts()

df = pd.DataFrame({'labels': temp.index,
                   'values': temp.values
                  })
df.iplot(kind='pie',labels='labels',values='values', title='Count of OsPlatformSubRelease')

In [None]:
temp = train['OsBuildLab'].value_counts()

temp

In [None]:
temp = train['SkuEdition'].value_counts()

df = pd.DataFrame({'labels': temp.index,
                   'values': temp.values
                  })
df.iplot(kind='pie',labels='labels',values='values', title='Count of SkuEdition')

In [None]:
temp = train['IsProtected'].value_counts()
df = pd.DataFrame({'labels': temp.index,
                   'values': temp.values
                  })
df.iplot(kind='pie',labels='labels',values='values', title='Count of IsProtected')

In [None]:
temp = train['AutoSampleOptIn'].value_counts()
df = pd.DataFrame({'labels': temp.index,
                   'values': temp.values
                  })
df.iplot(kind='pie',labels='labels',values='values', title='Count of AutoSampleOptIn')

In [None]:
temp = train['PuaMode'].value_counts()
temp

Pua Mode has many NA.

In [None]:
temp = train['PuaMode'].value_counts()
df = pd.DataFrame({'labels': temp.index,
                   'values': temp.values
                  })
df.iplot(kind='pie',labels='labels',values='values', title='Count of PuaMode')

In [None]:
temp = train['SMode'].value_counts()

df = pd.DataFrame({'labels': temp.index,
                   'values': temp.values
                  })
df.iplot(kind='pie',labels='labels',values='values', title='Count of SMode')

In [None]:
temp = train['IeVerIdentifier'].value_counts()

temp

In [None]:
temp = train['SmartScreen'].value_counts()

df = pd.DataFrame({'labels': temp.index,
                   'values': temp.values
                  })
df.iplot(kind='pie',labels='labels',values='values', title='Count of SmartScreen')

In [None]:
temp = train['Firewall'].value_counts()

df = pd.DataFrame({'labels': temp.index,
                   'values': temp.values
                  })
df.iplot(kind='pie',labels='labels',values='values', title='Count of Firewall')

In [None]:
temp = train['UacLuaenable'].value_counts()

df = pd.DataFrame({'labels': temp.index,
                   'values': temp.values
                  })
df.iplot(kind='pie',labels='labels',values='values', title='Count of UacLuaenable')

#### Census columns

In [None]:
temp = train['Census_MDC2FormFactor'].value_counts()

df = pd.DataFrame({'labels': temp.index,
                   'values': temp.values
                  })
df.iplot(kind='pie',labels='labels',values='values', title='Count of Census_MDC2FormFactor')

In [None]:
temp = train['Census_DeviceFamily'].value_counts()

df = pd.DataFrame({'labels': temp.index,
                   'values': temp.values
                  })
df.iplot(kind='pie',labels='labels',values='values', title='Count of Census_DeviceFamily')

In [None]:
temp = train['Census_OEMNameIdentifier'].value_counts()
temp

In [None]:
temp = train['Census_OEMModelIdentifier'].value_counts()
temp

In [None]:
temp = train['Census_ProcessorCoreCount'].value_counts()


df = pd.DataFrame({'labels': temp.index,
                   'values': temp.values
                  })
df.iplot(kind='pie',labels='labels',values='values', title='Count of Census_ProcessorCoreCount')

In [None]:
temp = train['Census_ProcessorManufacturerIdentifier'].value_counts()


df = pd.DataFrame({'labels': temp.index,
                   'values': temp.values
                  })
df.iplot(kind='pie',labels='labels',values='values', title='Count of Census_ProcessorManufacturerIdentifier')

In [None]:
temp = train['Census_ProcessorModelIdentifier'].value_counts()
temp

In [None]:
temp = train['Census_ProcessorClass'].value_counts()


df = pd.DataFrame({'labels': temp.index,
                   'values': temp.values
                  })
df.iplot(kind='pie',labels='labels',values='values', title='Count of Census_ProcessorClass')

In [None]:
train['Census_PrimaryDiskTotalCapacity'].describe()

In [None]:
temp = train['Census_PrimaryDiskTypeName'].value_counts()


df = pd.DataFrame({'labels': temp.index,
                   'values': temp.values
                  })
df.iplot(kind='pie',labels='labels',values='values', title='Count of Census_PrimaryDiskTypeName')

In [None]:
train['Census_SystemVolumeTotalCapacity'].describe()

In [None]:
temp = train['Census_HasOpticalDiskDrive'].value_counts()


df = pd.DataFrame({'labels': temp.index,
                   'values': temp.values
                  })
df.iplot(kind='pie',labels='labels',values='values', title='Count of Census_HasOpticalDiskDrive')

In [None]:
train['Census_TotalPhysicalRAM'].describe()

In [None]:
temp = train['Census_ChassisTypeName'].value_counts()
temp

In [None]:
train['Census_InternalPrimaryDiagonalDisplaySizeInInches'].describe()

In [None]:
train['Census_InternalPrimaryDisplayResolutionHorizontal'].describe()

In [None]:
train['Census_InternalPrimaryDisplayResolutionVertical'].describe()

In [None]:
temp = train['Census_PowerPlatformRoleName'].value_counts()


df = pd.DataFrame({'labels': temp.index,
                   'values': temp.values
                  })
df.iplot(kind='pie',labels='labels',values='values', title='Count of Census_PowerPlatformRoleName')

In [None]:
temp = train['Census_InternalBatteryType'].value_counts()
temp

In [None]:
train['Census_InternalBatteryNumberOfCharges'].describe()

In [None]:
temp = train['Census_OSVersion'].value_counts()
temp

In [None]:
temp = train['Census_OSArchitecture'].value_counts()


df = pd.DataFrame({'labels': temp.index,
                   'values': temp.values
                  })
df.iplot(kind='pie',labels='labels',values='values', title='Count of Census_OSArchitecture')

In [None]:
temp = train['Census_OSBranch'].value_counts()


df = pd.DataFrame({'labels': temp.index,
                   'values': temp.values
                  })
df.iplot(kind='pie',labels='labels',values='values', title='Count of Census_OSBranch')

In [None]:
temp = train['Census_OSBuildNumber'].value_counts()
temp

In [None]:
temp = train['Census_OSBuildRevision'].value_counts()
temp

In [None]:
temp = train['Census_OSEdition'].value_counts()


df = pd.DataFrame({'labels': temp.index,
                   'values': temp.values
                  })
df.iplot(kind='pie',labels='labels',values='values', title='Count of Census_OSEdition')

In [None]:
temp = train['Census_OSSkuName'].value_counts()


df = pd.DataFrame({'labels': temp.index,
                   'values': temp.values
                  })
df.iplot(kind='pie',labels='labels',values='values', title='Count of Census_OSSkuName')

In [None]:
temp = train['Census_OSInstallTypeName'].value_counts()


df = pd.DataFrame({'labels': temp.index,
                   'values': temp.values
                  })
df.iplot(kind='pie',labels='labels',values='values', title='Count of Census_OSInstallTypeName')

In [None]:
temp = train['Census_OSInstallLanguageIdentifier'].value_counts()


df = pd.DataFrame({'labels': temp.index,
                   'values': temp.values
                  })
df.iplot(kind='pie',labels='labels',values='values', title='Count of Census_OSInstallLanguageIdentifier')

In [None]:
temp = train['Census_OSUILocaleIdentifier'].value_counts()
temp

In [None]:
temp = train['Census_OSWUAutoUpdateOptionsName'].value_counts()


df = pd.DataFrame({'labels': temp.index,
                   'values': temp.values
                  })
df.iplot(kind='pie',labels='labels',values='values', title='Count of Census_OSWUAutoUpdateOptionsName')

In [None]:
temp = train['Census_IsPortableOperatingSystem'].value_counts()


df = pd.DataFrame({'labels': temp.index,
                   'values': temp.values
                  })
df.iplot(kind='pie',labels='labels',values='values', title='Count of Census_IsPortableOperatingSystem')

In [None]:
temp = train['Census_GenuineStateName'].value_counts()


df = pd.DataFrame({'labels': temp.index,
                   'values': temp.values
                  })
df.iplot(kind='pie',labels='labels',values='values', title='Count of Census_GenuineStateName')

In [None]:
temp = train['Census_ActivationChannel'].value_counts()


df = pd.DataFrame({'labels': temp.index,
                   'values': temp.values
                  })
df.iplot(kind='pie',labels='labels',values='values', title='Count of Census_ActivationChannel')

In [None]:
temp = train['Census_IsFlightingInternal'].value_counts()


df = pd.DataFrame({'labels': temp.index,
                   'values': temp.values
                  })
df.iplot(kind='pie',labels='labels',values='values', title='Count of Census_IsFlightingInternal')

In [None]:
temp = train['Census_IsFlightsDisabled'].value_counts()


df = pd.DataFrame({'labels': temp.index,
                   'values': temp.values
                  })
df.iplot(kind='pie',labels='labels',values='values', title='Count of Census_IsFlightsDisabled')

In [None]:
temp = train['Census_FlightRing'].value_counts()


df = pd.DataFrame({'labels': temp.index,
                   'values': temp.values
                  })
df.iplot(kind='pie',labels='labels',values='values', title='Count of Census_FlightRing')

In [None]:
temp = train['Census_ThresholdOptIn'].value_counts()


df = pd.DataFrame({'labels': temp.index,
                   'values': temp.values
                  })
df.iplot(kind='pie',labels='labels',values='values', title='Count of Census_ThresholdOptIn')

In [None]:
temp = train['Census_FirmwareManufacturerIdentifier'].value_counts()
temp

In [None]:
temp = train['Census_FirmwareVersionIdentifier'].value_counts()
temp

In [None]:
temp = train['Census_IsSecureBootEnabled'].value_counts()


df = pd.DataFrame({'labels': temp.index,
                   'values': temp.values
                  })
df.iplot(kind='pie',labels='labels',values='values', title='Count of Census_IsSecureBootEnabled')

In [None]:
temp = train['Census_IsWIMBootEnabled'].value_counts()


df = pd.DataFrame({'labels': temp.index,
                   'values': temp.values
                  })
df.iplot(kind='pie',labels='labels',values='values', title='Count of Census_IsWIMBootEnabled')

In [None]:
temp = train['Census_IsWIMBootEnabled'].value_counts()
temp

In [None]:
temp = train['Census_IsVirtualDevice'].value_counts()


df = pd.DataFrame({'labels': temp.index,
                   'values': temp.values
                  })
df.iplot(kind='pie',labels='labels',values='values', title='Count of Census_IsVirtualDevice')

In [None]:
temp = train['Census_IsTouchEnabled'].value_counts()


df = pd.DataFrame({'labels': temp.index,
                   'values': temp.values
                  })
df.iplot(kind='pie',labels='labels',values='values', title='Count of Census_IsTouchEnabled')

In [None]:
temp = train['Census_IsPenCapable'].value_counts()


df = pd.DataFrame({'labels': temp.index,
                   'values': temp.values
                  })
df.iplot(kind='pie',labels='labels',values='values', title='Count of Census_IsPenCapable')

In [None]:
temp = train['Census_IsPenCapable'].value_counts()


df = pd.DataFrame({'labels': temp.index,
                   'values': temp.values
                  })
df.iplot(kind='pie',labels='labels',values='values', title='Count of Census_IsPenCapable')

In [None]:
temp = train['Census_IsAlwaysOnAlwaysConnectedCapable'].value_counts()


df = pd.DataFrame({'labels': temp.index,
                   'values': temp.values
                  })
df.iplot(kind='pie',labels='labels',values='values', title='Count of Census_IsAlwaysOnAlwaysConnectedCapable')

In [None]:
temp = train['Wdft_IsGamer'].value_counts()


df = pd.DataFrame({'labels': temp.index,
                   'values': temp.values
                  })
df.iplot(kind='pie',labels='labels',values='values', title='Count of Wdft_IsGamer')

In [None]:
temp = train['Wdft_RegionIdentifier'].value_counts()


df = pd.DataFrame({'labels': temp.index,
                   'values': temp.values
                  })
df.iplot(kind='pie',labels='labels',values='values', title='Count of Wdft_RegionIdentifier')

### Data Exploration with Target Variable

In [None]:
temp = train["EngineVersion"].value_counts()
#print(temp.values)
temp_y0 = []
temp_y1 = []
for val in temp.index:
    temp_y1.append(np.sum(train["HasDetections"][train["EngineVersion"]==val] == 1))
    temp_y0.append(np.sum(train["HasDetections"][train["EngineVersion"]==val] == 0))    
trace1 = go.Bar(
    x = temp.index,
    y = (temp_y1 / temp.sum()) * 100,
    name='Detected'
)
trace2 = go.Bar(
    x = temp.index,
    y = (temp_y0 / temp.sum()) * 100, 
    name='Not Detected'
)

data = [trace1, trace2]
layout = go.Layout(
    title = "Type of EngineVersion is HasDetections or not in %",
    #barmode='stack',
    width = 1000,
    xaxis=dict(
        title='',
        tickfont=dict(
            size=14,
            color='rgb(107, 107, 107)'
        )
    ),
    yaxis=dict(
        title='Count in %',
        titlefont=dict(
            size=16,
            color='rgb(107, 107, 107)'
        ),
        tickfont=dict(
            size=14,
            color='rgb(107, 107, 107)'
        )
)
)

fig = go.Figure(data=data, layout=layout)
iplot(fig)

In [None]:
temp = train["EngineVersion"].value_counts()
#print(temp.values)
temp_y0 = []
temp_y1 = []
for val in temp.index:
    temp_y1.append(np.sum(train["HasDetections"][train["EngineVersion"]==val] == 1))
    temp_y0.append(np.sum(train["HasDetections"][train["EngineVersion"]==val] == 0))    
trace1 = go.Bar(
    x = temp.index,
    y = (temp_y1 / temp.sum()) * 100,
    name='Detected'
)
trace2 = go.Bar(
    x = temp.index,
    y = (temp_y0 / temp.sum()) * 100, 
    name='Not Detected'
)

data = [trace1, trace2]
layout = go.Layout(
    title = "Type of EngineVersion is HasDetections or not in %",
    #barmode='stack',
    width = 1000,
    xaxis=dict(
        title='',
        tickfont=dict(
            size=14,
            color='rgb(107, 107, 107)'
        )
    ),
    yaxis=dict(
        title='Count in %',
        titlefont=dict(
            size=16,
            color='rgb(107, 107, 107)'
        ),
        tickfont=dict(
            size=14,
            color='rgb(107, 107, 107)'
        )
)
)

fig = go.Figure(data=data, layout=layout)
iplot(fig)

In [None]:
temp = train["AppVersion"].value_counts()
#print(temp.values)
temp_y0 = []
temp_y1 = []
for val in temp.index:
    temp_y1.append(np.sum(train["HasDetections"][train["AppVersion"]==val] == 1))
    temp_y0.append(np.sum(train["HasDetections"][train["AppVersion"]==val] == 0))    
trace1 = go.Bar(
    x = temp.index,
    y = (temp_y1 / temp.sum()) * 100,
    name='Detected'
)
trace2 = go.Bar(
    x = temp.index,
    y = (temp_y0 / temp.sum()) * 100, 
    name='Not Detected'
)

data = [trace1, trace2]
layout = go.Layout(
    title = "Type of AppVersion is HasDetections or not in %",
    #barmode='stack',
    width = 1000,
    xaxis=dict(
        title='',
        tickfont=dict(
            size=14,
            color='rgb(107, 107, 107)'
        )
    ),
    yaxis=dict(
        title='Count in %',
        titlefont=dict(
            size=16,
            color='rgb(107, 107, 107)'
        ),
        tickfont=dict(
            size=14,
            color='rgb(107, 107, 107)'
        )
)
)

fig = go.Figure(data=data, layout=layout)
iplot(fig)

In [None]:
temp = train["RtpStateBitfield"].value_counts()
#print(temp.values)
temp_y0 = []
temp_y1 = []
for val in temp.index:
    temp_y1.append(np.sum(train["HasDetections"][train["RtpStateBitfield"]==val] == 1))
    temp_y0.append(np.sum(train["HasDetections"][train["RtpStateBitfield"]==val] == 0))    
trace1 = go.Bar(
    x = temp.index,
    y = (temp_y1 / temp.sum()) * 100,
    name='Detected'
)
trace2 = go.Bar(
    x = temp.index,
    y = (temp_y0 / temp.sum()) * 100, 
    name='Not Detected'
)

data = [trace1, trace2]
layout = go.Layout(
    title = "Values of RtpStateBitfield is HasDetections or not in %",
    #barmode='stack',
    width = 1000,
    xaxis=dict(
        title='',
        tickfont=dict(
            size=14,
            color='rgb(107, 107, 107)'
        )
    ),
    yaxis=dict(
        title='Count in %',
        titlefont=dict(
            size=16,
            color='rgb(107, 107, 107)'
        ),
        tickfont=dict(
            size=14,
            color='rgb(107, 107, 107)'
        )
)
)

fig = go.Figure(data=data, layout=layout)
iplot(fig)

In [None]:
temp = train["IsSxsPassiveMode"].value_counts()
#print(temp.values)
temp_y0 = []
temp_y1 = []
for val in temp.index:
    temp_y1.append(np.sum(train["HasDetections"][train["IsSxsPassiveMode"]==val] == 1))
    temp_y0.append(np.sum(train["HasDetections"][train["IsSxsPassiveMode"]==val] == 0))    
trace1 = go.Bar(
    x = temp.index,
    y = (temp_y1 / temp.sum()) * 100,
    name='Detected'
)
trace2 = go.Bar(
    x = temp.index,
    y = (temp_y0 / temp.sum()) * 100, 
    name='Not Detected'
)

data = [trace1, trace2]
layout = go.Layout(
    title = "Values of IsSxsPassiveMode is HasDetections or not in %",
    #barmode='stack',
    width = 1000,
    xaxis=dict(
        title='',
        tickfont=dict(
            size=14,
            color='rgb(107, 107, 107)'
        )
    ),
    yaxis=dict(
        title='Count in %',
        titlefont=dict(
            size=16,
            color='rgb(107, 107, 107)'
        ),
        tickfont=dict(
            size=14,
            color='rgb(107, 107, 107)'
        )
)
)

fig = go.Figure(data=data, layout=layout)
iplot(fig)

In [None]:
temp = train["AVProductsInstalled"].value_counts()
#print(temp.values)
temp_y0 = []
temp_y1 = []
for val in temp.index:
    temp_y1.append(np.sum(train["HasDetections"][train["AVProductsInstalled"]==val] == 1))
    temp_y0.append(np.sum(train["HasDetections"][train["AVProductsInstalled"]==val] == 0))    
trace1 = go.Bar(
    x = temp.index,
    y = (temp_y1 / temp.sum()) * 100,
    name='Detected'
)
trace2 = go.Bar(
    x = temp.index,
    y = (temp_y0 / temp.sum()) * 100, 
    name='Not Detected'
)

data = [trace1, trace2]
layout = go.Layout(
    title = "Values of AVProductsInstalled is HasDetections or not in %",
    #barmode='stack',
    width = 1000,
    xaxis=dict(
        title='',
        tickfont=dict(
            size=14,
            color='rgb(107, 107, 107)'
        )
    ),
    yaxis=dict(
        title='Count in %',
        titlefont=dict(
            size=16,
            color='rgb(107, 107, 107)'
        ),
        tickfont=dict(
            size=14,
            color='rgb(107, 107, 107)'
        )
)
)

fig = go.Figure(data=data, layout=layout)
iplot(fig)

In [None]:
temp = train["OrganizationIdentifier"].value_counts()
#print(temp.values)
temp_y0 = []
temp_y1 = []
for val in temp.index:
    temp_y1.append(np.sum(train["HasDetections"][train["OrganizationIdentifier"]==val] == 1))
    temp_y0.append(np.sum(train["HasDetections"][train["OrganizationIdentifier"]==val] == 0))    
trace1 = go.Bar(
    x = temp.index,
    y = (temp_y1 / temp.sum()) * 100,
    name='Detected'
)
trace2 = go.Bar(
    x = temp.index,
    y = (temp_y0 / temp.sum()) * 100, 
    name='Not Detected'
)

data = [trace1, trace2]
layout = go.Layout(
    title = "Values of OrganizationIdentifier is HasDetections or not in %",
    #barmode='stack',
    width = 1000,
    xaxis=dict(
        title='',
        tickfont=dict(
            size=14,
            color='rgb(107, 107, 107)'
        )
    ),
    yaxis=dict(
        title='Count in %',
        titlefont=dict(
            size=16,
            color='rgb(107, 107, 107)'
        ),
        tickfont=dict(
            size=14,
            color='rgb(107, 107, 107)'
        )
)
)

fig = go.Figure(data=data, layout=layout)
iplot(fig)

In [None]:
temp = train["Platform"].value_counts()
#print(temp.values)
temp_y0 = []
temp_y1 = []
for val in temp.index:
    temp_y1.append(np.sum(train["HasDetections"][train["Platform"]==val] == 1))
    temp_y0.append(np.sum(train["HasDetections"][train["Platform"]==val] == 0))    
trace1 = go.Bar(
    x = temp.index,
    y = (temp_y1 / temp.sum()) * 100,
    name='Detected'
)
trace2 = go.Bar(
    x = temp.index,
    y = (temp_y0 / temp.sum()) * 100, 
    name='Not Detected'
)

data = [trace1, trace2]
layout = go.Layout(
    title = "Types of Platform is HasDetections or not in %",
    #barmode='stack',
    width = 1000,
    xaxis=dict(
        title='',
        tickfont=dict(
            size=14,
            color='rgb(107, 107, 107)'
        )
    ),
    yaxis=dict(
        title='Count in %',
        titlefont=dict(
            size=16,
            color='rgb(107, 107, 107)'
        ),
        tickfont=dict(
            size=14,
            color='rgb(107, 107, 107)'
        )
)
)

fig = go.Figure(data=data, layout=layout)
iplot(fig)

In [None]:
temp = train["Processor"].value_counts()
#print(temp.values)
temp_y0 = []
temp_y1 = []
for val in temp.index:
    temp_y1.append(np.sum(train["HasDetections"][train["Processor"]==val] == 1))
    temp_y0.append(np.sum(train["HasDetections"][train["Processor"]==val] == 0))    
trace1 = go.Bar(
    x = temp.index,
    y = (temp_y1 / temp.sum()) * 100,
    name='Detected'
)
trace2 = go.Bar(
    x = temp.index,
    y = (temp_y0 / temp.sum()) * 100, 
    name='Not Detected'
)

data = [trace1, trace2]
layout = go.Layout(
    title = "Types of Processor is HasDetections or not in %",
    #barmode='stack',
    width = 1000,
    xaxis=dict(
        title='',
        tickfont=dict(
            size=14,
            color='rgb(107, 107, 107)'
        )
    ),
    yaxis=dict(
        title='Count in %',
        titlefont=dict(
            size=16,
            color='rgb(107, 107, 107)'
        ),
        tickfont=dict(
            size=14,
            color='rgb(107, 107, 107)'
        )
)
)

fig = go.Figure(data=data, layout=layout)
iplot(fig)

In [None]:
temp = train["OsVer"].value_counts()
#print(temp.values)
temp_y0 = []
temp_y1 = []
for val in temp.index:
    temp_y1.append(np.sum(train["HasDetections"][train["OsVer"]==val] == 1))
    temp_y0.append(np.sum(train["HasDetections"][train["OsVer"]==val] == 0))    
trace1 = go.Bar(
    x = temp.index,
    y = (temp_y1 / temp.sum()) * 100,
    name='Detected'
)
trace2 = go.Bar(
    x = temp.index,
    y = (temp_y0 / temp.sum()) * 100, 
    name='Not Detected'
)

data = [trace1, trace2]
layout = go.Layout(
    title = "Values of OsVer is HasDetections or not in %",
    #barmode='stack',
    width = 1000,
    xaxis=dict(
        title='',
        tickfont=dict(
            size=14,
            color='rgb(107, 107, 107)'
        )
    ),
    yaxis=dict(
        title='Count in %',
        titlefont=dict(
            size=16,
            color='rgb(107, 107, 107)'
        ),
        tickfont=dict(
            size=14,
            color='rgb(107, 107, 107)'
        )
)
)

fig = go.Figure(data=data, layout=layout)
iplot(fig)

In [None]:
temp = train["OsSuite"].value_counts()
#print(temp.values)
temp_y0 = []
temp_y1 = []
for val in temp.index:
    temp_y1.append(np.sum(train["HasDetections"][train["OsSuite"]==val] == 1))
    temp_y0.append(np.sum(train["HasDetections"][train["OsSuite"]==val] == 0))    
trace1 = go.Bar(
    x = temp.index,
    y = (temp_y1 / temp.sum()) * 100,
    name='Detected'
)
trace2 = go.Bar(
    x = temp.index,
    y = (temp_y0 / temp.sum()) * 100, 
    name='Not Detected'
)

data = [trace1, trace2]
layout = go.Layout(
    title = "Values of OsSuite is HasDetections or not in %",
    #barmode='stack',
    width = 1000,
    xaxis=dict(
        title='',
        tickfont=dict(
            size=14,
            color='rgb(107, 107, 107)'
        )
    ),
    yaxis=dict(
        title='Count in %',
        titlefont=dict(
            size=16,
            color='rgb(107, 107, 107)'
        ),
        tickfont=dict(
            size=14,
            color='rgb(107, 107, 107)'
        )
)
)

fig = go.Figure(data=data, layout=layout)
iplot(fig)

In [None]:
temp = train["OsPlatformSubRelease"].value_counts()
#print(temp.values)
temp_y0 = []
temp_y1 = []
for val in temp.index:
    temp_y1.append(np.sum(train["HasDetections"][train["OsPlatformSubRelease"]==val] == 1))
    temp_y0.append(np.sum(train["HasDetections"][train["OsPlatformSubRelease"]==val] == 0))    
trace1 = go.Bar(
    x = temp.index,
    y = (temp_y1 / temp.sum()) * 100,
    name='Detected'
)
trace2 = go.Bar(
    x = temp.index,
    y = (temp_y0 / temp.sum()) * 100, 
    name='Not Detected'
)

data = [trace1, trace2]
layout = go.Layout(
    title = "Type of OsPlatformSubRelease is HasDetections or not in %",
    #barmode='stack',
    width = 1000,
    xaxis=dict(
        title='',
        tickfont=dict(
            size=14,
            color='rgb(107, 107, 107)'
        )
    ),
    yaxis=dict(
        title='Count in %',
        titlefont=dict(
            size=16,
            color='rgb(107, 107, 107)'
        ),
        tickfont=dict(
            size=14,
            color='rgb(107, 107, 107)'
        )
)
)

fig = go.Figure(data=data, layout=layout)
iplot(fig)

In [None]:
temp = train["SkuEdition"].value_counts()
#print(temp.values)
temp_y0 = []
temp_y1 = []
for val in temp.index:
    temp_y1.append(np.sum(train["HasDetections"][train["SkuEdition"]==val] == 1))
    temp_y0.append(np.sum(train["HasDetections"][train["SkuEdition"]==val] == 0))    
trace1 = go.Bar(
    x = temp.index,
    y = (temp_y1 / temp.sum()) * 100,
    name='Detected'
)
trace2 = go.Bar(
    x = temp.index,
    y = (temp_y0 / temp.sum()) * 100, 
    name='Not Detected'
)

data = [trace1, trace2]
layout = go.Layout(
    title = "Type of SkuEdition is HasDetections or not in %",
    #barmode='stack',
    width = 1000,
    xaxis=dict(
        title='',
        tickfont=dict(
            size=14,
            color='rgb(107, 107, 107)'
        )
    ),
    yaxis=dict(
        title='Count in %',
        titlefont=dict(
            size=16,
            color='rgb(107, 107, 107)'
        ),
        tickfont=dict(
            size=14,
            color='rgb(107, 107, 107)'
        )
)
)

fig = go.Figure(data=data, layout=layout)
iplot(fig)

In [None]:
temp = train["IsProtected"].value_counts()
#print(temp.values)
temp_y0 = []
temp_y1 = []
for val in temp.index:
    temp_y1.append(np.sum(train["HasDetections"][train["IsProtected"]==val] == 1))
    temp_y0.append(np.sum(train["HasDetections"][train["IsProtected"]==val] == 0))    
trace1 = go.Bar(
    x = temp.index,
    y = (temp_y1 / temp.sum()) * 100,
    name='Detected'
)
trace2 = go.Bar(
    x = temp.index,
    y = (temp_y0 / temp.sum()) * 100, 
    name='Not Detected'
)

data = [trace1, trace2]
layout = go.Layout(
    title = "Values of IsProtected is HasDetections or not in %",
    #barmode='stack',
    width = 1000,
    xaxis=dict(
        title='',
        tickfont=dict(
            size=14,
            color='rgb(107, 107, 107)'
        )
    ),
    yaxis=dict(
        title='Count in %',
        titlefont=dict(
            size=16,
            color='rgb(107, 107, 107)'
        ),
        tickfont=dict(
            size=14,
            color='rgb(107, 107, 107)'
        )
)
)

fig = go.Figure(data=data, layout=layout)
iplot(fig)

In [None]:
temp = train["SmartScreen"].value_counts()
#print(temp.values)
temp_y0 = []
temp_y1 = []
for val in temp.index:
    temp_y1.append(np.sum(train["HasDetections"][train["SmartScreen"]==val] == 1))
    temp_y0.append(np.sum(train["HasDetections"][train["SmartScreen"]==val] == 0))    
trace1 = go.Bar(
    x = temp.index,
    y = (temp_y1 / temp.sum()) * 100,
    name='Detected'
)
trace2 = go.Bar(
    x = temp.index,
    y = (temp_y0 / temp.sum()) * 100, 
    name='Not Detected'
)

data = [trace1, trace2]
layout = go.Layout(
    title = "Types of SmartScreen is HasDetections or not in %",
    #barmode='stack',
    width = 1000,
    xaxis=dict(
        title='',
        tickfont=dict(
            size=14,
            color='rgb(107, 107, 107)'
        )
    ),
    yaxis=dict(
        title='Count in %',
        titlefont=dict(
            size=16,
            color='rgb(107, 107, 107)'
        ),
        tickfont=dict(
            size=14,
            color='rgb(107, 107, 107)'
        )
)
)

fig = go.Figure(data=data, layout=layout)
iplot(fig)

In [None]:
temp = train["UacLuaenable"].value_counts()
#print(temp.values)
temp_y0 = []
temp_y1 = []
for val in temp.index:
    temp_y1.append(np.sum(train["HasDetections"][train["UacLuaenable"]==val] == 1))
    temp_y0.append(np.sum(train["HasDetections"][train["UacLuaenable"]==val] == 0))    
trace1 = go.Bar(
    x = temp.index,
    y = (temp_y1 / temp.sum()) * 100,
    name='Detected'
)
trace2 = go.Bar(
    x = temp.index,
    y = (temp_y0 / temp.sum()) * 100, 
    name='Not Detected'
)

data = [trace1, trace2]
layout = go.Layout(
    title = "Values of UacLuaenable is HasDetections or not in %",
    #barmode='stack',
    width = 1000,
    xaxis=dict(
        title='',
        tickfont=dict(
            size=14,
            color='rgb(107, 107, 107)'
        )
    ),
    yaxis=dict(
        title='Count in %',
        titlefont=dict(
            size=16,
            color='rgb(107, 107, 107)'
        ),
        tickfont=dict(
            size=14,
            color='rgb(107, 107, 107)'
        )
)
)

fig = go.Figure(data=data, layout=layout)
iplot(fig)

In [None]:
temp = train["Census_MDC2FormFactor"].value_counts()
#print(temp.values)
temp_y0 = []
temp_y1 = []
for val in temp.index:
    temp_y1.append(np.sum(train["HasDetections"][train["Census_MDC2FormFactor"]==val] == 1))
    temp_y0.append(np.sum(train["HasDetections"][train["Census_MDC2FormFactor"]==val] == 0))    
trace1 = go.Bar(
    x = temp.index,
    y = (temp_y1 / temp.sum()) * 100,
    name='Detected'
)
trace2 = go.Bar(
    x = temp.index,
    y = (temp_y0 / temp.sum()) * 100, 
    name='Not Detected'
)

data = [trace1, trace2]
layout = go.Layout(
    title = "Types of Census_MDC2FormFactor is HasDetections or not in %",
    #barmode='stack',
    width = 1000,
    xaxis=dict(
        title='',
        tickfont=dict(
            size=14,
            color='rgb(107, 107, 107)'
        )
    ),
    yaxis=dict(
        title='Count in %',
        titlefont=dict(
            size=16,
            color='rgb(107, 107, 107)'
        ),
        tickfont=dict(
            size=14,
            color='rgb(107, 107, 107)'
        )
)
)

fig = go.Figure(data=data, layout=layout)
iplot(fig)

In [None]:
temp = train["Census_DeviceFamily"].value_counts()
#print(temp.values)
temp_y0 = []
temp_y1 = []
for val in temp.index:
    temp_y1.append(np.sum(train["HasDetections"][train["Census_DeviceFamily"]==val] == 1))
    temp_y0.append(np.sum(train["HasDetections"][train["Census_DeviceFamily"]==val] == 0))    
trace1 = go.Bar(
    x = temp.index,
    y = (temp_y1 / temp.sum()) * 100,
    name='Detected'
)
trace2 = go.Bar(
    x = temp.index,
    y = (temp_y0 / temp.sum()) * 100, 
    name='Not Detected'
)

data = [trace1, trace2]
layout = go.Layout(
    title = "Types of Census_DeviceFamily is HasDetections or not in %",
    #barmode='stack',
    width = 1000,
    xaxis=dict(
        title='',
        tickfont=dict(
            size=14,
            color='rgb(107, 107, 107)'
        )
    ),
    yaxis=dict(
        title='Count in %',
        titlefont=dict(
            size=16,
            color='rgb(107, 107, 107)'
        ),
        tickfont=dict(
            size=14,
            color='rgb(107, 107, 107)'
        )
)
)

fig = go.Figure(data=data, layout=layout)
iplot(fig)

In [None]:
temp = train["Census_ProcessorCoreCount"].value_counts()
#print(temp.values)
temp_y0 = []
temp_y1 = []
for val in temp.index:
    temp_y1.append(np.sum(train["HasDetections"][train["Census_ProcessorCoreCount"]==val] == 1))
    temp_y0.append(np.sum(train["HasDetections"][train["Census_ProcessorCoreCount"]==val] == 0))    
trace1 = go.Bar(
    x = temp.index,
    y = (temp_y1 / temp.sum()) * 100,
    name='Detected'
)
trace2 = go.Bar(
    x = temp.index,
    y = (temp_y0 / temp.sum()) * 100, 
    name='Not Detected'
)

data = [trace1, trace2]
layout = go.Layout(
    title = "Values of Census_ProcessorCoreCount is HasDetections or not in %",
    #barmode='stack',
    width = 1000,
    xaxis=dict(
        title='',
        tickfont=dict(
            size=14,
            color='rgb(107, 107, 107)'
        )
    ),
    yaxis=dict(
        title='Count in %',
        titlefont=dict(
            size=16,
            color='rgb(107, 107, 107)'
        ),
        tickfont=dict(
            size=14,
            color='rgb(107, 107, 107)'
        )
)
)

fig = go.Figure(data=data, layout=layout)
iplot(fig)

In [None]:
temp = train["Census_ProcessorManufacturerIdentifier"].value_counts()
#print(temp.values)
temp_y0 = []
temp_y1 = []
for val in temp.index:
    temp_y1.append(np.sum(train["HasDetections"][train["Census_ProcessorManufacturerIdentifier"]==val] == 1))
    temp_y0.append(np.sum(train["HasDetections"][train["Census_ProcessorManufacturerIdentifier"]==val] == 0))    
trace1 = go.Bar(
    x = temp.index,
    y = (temp_y1 / temp.sum()) * 100,
    name='Detected'
)
trace2 = go.Bar(
    x = temp.index,
    y = (temp_y0 / temp.sum()) * 100, 
    name='Not Detected'
)

data = [trace1, trace2]
layout = go.Layout(
    title = "Values of Census_ProcessorManufacturerIdentifier is HasDetections or not in %",
    #barmode='stack',
    width = 1000,
    xaxis=dict(
        title='',
        tickfont=dict(
            size=14,
            color='rgb(107, 107, 107)'
        )
    ),
    yaxis=dict(
        title='Count in %',
        titlefont=dict(
            size=16,
            color='rgb(107, 107, 107)'
        ),
        tickfont=dict(
            size=14,
            color='rgb(107, 107, 107)'
        )
)
)

fig = go.Figure(data=data, layout=layout)
iplot(fig)

In [None]:
temp = train["Census_ProcessorClass"].value_counts()
#print(temp.values)
temp_y0 = []
temp_y1 = []
for val in temp.index:
    temp_y1.append(np.sum(train["HasDetections"][train["Census_ProcessorClass"]==val] == 1))
    temp_y0.append(np.sum(train["HasDetections"][train["Census_ProcessorClass"]==val] == 0))    
trace1 = go.Bar(
    x = temp.index,
    y = (temp_y1 / temp.sum()) * 100,
    name='Detected'
)
trace2 = go.Bar(
    x = temp.index,
    y = (temp_y0 / temp.sum()) * 100, 
    name='Not Detected'
)

data = [trace1, trace2]
layout = go.Layout(
    title = "Types of Census_ProcessorClass is HasDetections or not in %",
    #barmode='stack',
    width = 1000,
    xaxis=dict(
        title='',
        tickfont=dict(
            size=14,
            color='rgb(107, 107, 107)'
        )
    ),
    yaxis=dict(
        title='Count in %',
        titlefont=dict(
            size=16,
            color='rgb(107, 107, 107)'
        ),
        tickfont=dict(
            size=14,
            color='rgb(107, 107, 107)'
        )
)
)

fig = go.Figure(data=data, layout=layout)
iplot(fig)

In [None]:
temp = train["Census_PrimaryDiskTypeName"].value_counts()
#print(temp.values)
temp_y0 = []
temp_y1 = []
for val in temp.index:
    temp_y1.append(np.sum(train["HasDetections"][train["Census_PrimaryDiskTypeName"]==val] == 1))
    temp_y0.append(np.sum(train["HasDetections"][train["Census_PrimaryDiskTypeName"]==val] == 0))    
trace1 = go.Bar(
    x = temp.index,
    y = (temp_y1 / temp.sum()) * 100,
    name='Detected'
)
trace2 = go.Bar(
    x = temp.index,
    y = (temp_y0 / temp.sum()) * 100, 
    name='Not Detected'
)

data = [trace1, trace2]
layout = go.Layout(
    title = "Types of Census_PrimaryDiskTypeName is HasDetections or not in %",
    #barmode='stack',
    width = 1000,
    xaxis=dict(
        title='',
        tickfont=dict(
            size=14,
            color='rgb(107, 107, 107)'
        )
    ),
    yaxis=dict(
        title='Count in %',
        titlefont=dict(
            size=16,
            color='rgb(107, 107, 107)'
        ),
        tickfont=dict(
            size=14,
            color='rgb(107, 107, 107)'
        )
)
)

fig = go.Figure(data=data, layout=layout)
iplot(fig)

In [None]:
temp = train["Census_HasOpticalDiskDrive"].value_counts()
#print(temp.values)
temp_y0 = []
temp_y1 = []
for val in temp.index:
    temp_y1.append(np.sum(train["HasDetections"][train["Census_HasOpticalDiskDrive"]==val] == 1))
    temp_y0.append(np.sum(train["HasDetections"][train["Census_HasOpticalDiskDrive"]==val] == 0))    
trace1 = go.Bar(
    x = temp.index,
    y = (temp_y1 / temp.sum()) * 100,
    name='Detected'
)
trace2 = go.Bar(
    x = temp.index,
    y = (temp_y0 / temp.sum()) * 100, 
    name='Not Detected'
)

data = [trace1, trace2]
layout = go.Layout(
    title = "Values of Census_HasOpticalDiskDrive is HasDetections or not in %",
    #barmode='stack',
    width = 1000,
    xaxis=dict(
        title='',
        tickfont=dict(
            size=14,
            color='rgb(107, 107, 107)'
        )
    ),
    yaxis=dict(
        title='Count in %',
        titlefont=dict(
            size=16,
            color='rgb(107, 107, 107)'
        ),
        tickfont=dict(
            size=14,
            color='rgb(107, 107, 107)'
        )
)
)

fig = go.Figure(data=data, layout=layout)
iplot(fig)

In [None]:
temp = train["Census_PowerPlatformRoleName"].value_counts()
#print(temp.values)
temp_y0 = []
temp_y1 = []
for val in temp.index:
    temp_y1.append(np.sum(train["HasDetections"][train["Census_PowerPlatformRoleName"]==val] == 1))
    temp_y0.append(np.sum(train["HasDetections"][train["Census_PowerPlatformRoleName"]==val] == 0))    
trace1 = go.Bar(
    x = temp.index,
    y = (temp_y1 / temp.sum()) * 100,
    name='Detected'
)
trace2 = go.Bar(
    x = temp.index,
    y = (temp_y0 / temp.sum()) * 100, 
    name='Not Detected'
)

data = [trace1, trace2]
layout = go.Layout(
    title = "Values of Census_PowerPlatformRoleName is HasDetections or not in %",
    #barmode='stack',
    width = 1000,
    xaxis=dict(
        title='',
        tickfont=dict(
            size=14,
            color='rgb(107, 107, 107)'
        )
    ),
    yaxis=dict(
        title='Count in %',
        titlefont=dict(
            size=16,
            color='rgb(107, 107, 107)'
        ),
        tickfont=dict(
            size=14,
            color='rgb(107, 107, 107)'
        )
)
)

fig = go.Figure(data=data, layout=layout)
iplot(fig)

In [None]:
temp = train["Census_OSArchitecture"].value_counts()
#print(temp.values)
temp_y0 = []
temp_y1 = []
for val in temp.index:
    temp_y1.append(np.sum(train["HasDetections"][train["Census_OSArchitecture"]==val] == 1))
    temp_y0.append(np.sum(train["HasDetections"][train["Census_OSArchitecture"]==val] == 0))    
trace1 = go.Bar(
    x = temp.index,
    y = (temp_y1 / temp.sum()) * 100,
    name='Detected'
)
trace2 = go.Bar(
    x = temp.index,
    y = (temp_y0 / temp.sum()) * 100, 
    name='Not Detected'
)

data = [trace1, trace2]
layout = go.Layout(
    title = "Types of Census_OSArchitecture is HasDetections or not in %",
    #barmode='stack',
    width = 1000,
    xaxis=dict(
        title='',
        tickfont=dict(
            size=14,
            color='rgb(107, 107, 107)'
        )
    ),
    yaxis=dict(
        title='Count in %',
        titlefont=dict(
            size=16,
            color='rgb(107, 107, 107)'
        ),
        tickfont=dict(
            size=14,
            color='rgb(107, 107, 107)'
        )
)
)

fig = go.Figure(data=data, layout=layout)
iplot(fig)

In [None]:
temp = train["Census_OSBranch"].value_counts()
#print(temp.values)
temp_y0 = []
temp_y1 = []
for val in temp.index:
    temp_y1.append(np.sum(train["HasDetections"][train["Census_OSBranch"]==val] == 1))
    temp_y0.append(np.sum(train["HasDetections"][train["Census_OSBranch"]==val] == 0))    
trace1 = go.Bar(
    x = temp.index,
    y = (temp_y1 / temp.sum()) * 100,
    name='Detected'
)
trace2 = go.Bar(
    x = temp.index,
    y = (temp_y0 / temp.sum()) * 100, 
    name='Not Detected'
)

data = [trace1, trace2]
layout = go.Layout(
    title = "Types of Census_OSBranch is HasDetections or not in %",
    #barmode='stack',
    width = 1000,
    xaxis=dict(
        title='',
        tickfont=dict(
            size=14,
            color='rgb(107, 107, 107)'
        )
    ),
    yaxis=dict(
        title='Count in %',
        titlefont=dict(
            size=16,
            color='rgb(107, 107, 107)'
        ),
        tickfont=dict(
            size=14,
            color='rgb(107, 107, 107)'
        )
)
)

fig = go.Figure(data=data, layout=layout)
iplot(fig)

In [None]:
temp = train["Census_OSEdition"].value_counts()
#print(temp.values)
temp_y0 = []
temp_y1 = []
for val in temp.index:
    temp_y1.append(np.sum(train["HasDetections"][train["Census_OSEdition"]==val] == 1))
    temp_y0.append(np.sum(train["HasDetections"][train["Census_OSEdition"]==val] == 0))    
trace1 = go.Bar(
    x = temp.index,
    y = (temp_y1 / temp.sum()) * 100,
    name='Detected'
)
trace2 = go.Bar(
    x = temp.index,
    y = (temp_y0 / temp.sum()) * 100, 
    name='Not Detected'
)

data = [trace1, trace2]
layout = go.Layout(
    title = "Types of Census_OSEdition is HasDetections or not in %",
    #barmode='stack',
    width = 1000,
    xaxis=dict(
        title='',
        tickfont=dict(
            size=14,
            color='rgb(107, 107, 107)'
        )
    ),
    yaxis=dict(
        title='Count in %',
        titlefont=dict(
            size=16,
            color='rgb(107, 107, 107)'
        ),
        tickfont=dict(
            size=14,
            color='rgb(107, 107, 107)'
        )
)
)

fig = go.Figure(data=data, layout=layout)
iplot(fig)

In [None]:
temp = train["Census_OSSkuName"].value_counts()
#print(temp.values)
temp_y0 = []
temp_y1 = []
for val in temp.index:
    temp_y1.append(np.sum(train["HasDetections"][train["Census_OSSkuName"]==val] == 1))
    temp_y0.append(np.sum(train["HasDetections"][train["Census_OSSkuName"]==val] == 0))    
trace1 = go.Bar(
    x = temp.index,
    y = (temp_y1 / temp.sum()) * 100,
    name='Detected'
)
trace2 = go.Bar(
    x = temp.index,
    y = (temp_y0 / temp.sum()) * 100, 
    name='Not Detected'
)

data = [trace1, trace2]
layout = go.Layout(
    title = "Types of Census_OSSkuName is HasDetections or not in %",
    #barmode='stack',
    width = 1000,
    xaxis=dict(
        title='',
        tickfont=dict(
            size=14,
            color='rgb(107, 107, 107)'
        )
    ),
    yaxis=dict(
        title='Count in %',
        titlefont=dict(
            size=16,
            color='rgb(107, 107, 107)'
        ),
        tickfont=dict(
            size=14,
            color='rgb(107, 107, 107)'
        )
)
)

fig = go.Figure(data=data, layout=layout)
iplot(fig)

In [None]:
temp = train["Census_OSInstallTypeName"].value_counts()
#print(temp.values)
temp_y0 = []
temp_y1 = []
for val in temp.index:
    temp_y1.append(np.sum(train["HasDetections"][train["Census_OSInstallTypeName"]==val] == 1))
    temp_y0.append(np.sum(train["HasDetections"][train["Census_OSInstallTypeName"]==val] == 0))    
trace1 = go.Bar(
    x = temp.index,
    y = (temp_y1 / temp.sum()) * 100,
    name='Detected'
)
trace2 = go.Bar(
    x = temp.index,
    y = (temp_y0 / temp.sum()) * 100, 
    name='Not Detected'
)

data = [trace1, trace2]
layout = go.Layout(
    title = "Types of Census_OSInstallTypeName is HasDetections or not in %",
    #barmode='stack',
    width = 1000,
    xaxis=dict(
        title='',
        tickfont=dict(
            size=14,
            color='rgb(107, 107, 107)'
        )
    ),
    yaxis=dict(
        title='Count in %',
        titlefont=dict(
            size=16,
            color='rgb(107, 107, 107)'
        ),
        tickfont=dict(
            size=14,
            color='rgb(107, 107, 107)'
        )
)
)

fig = go.Figure(data=data, layout=layout)
iplot(fig)

In [None]:
temp = train["Census_OSWUAutoUpdateOptionsName"].value_counts()
#print(temp.values)
temp_y0 = []
temp_y1 = []
for val in temp.index:
    temp_y1.append(np.sum(train["HasDetections"][train["Census_OSWUAutoUpdateOptionsName"]==val] == 1))
    temp_y0.append(np.sum(train["HasDetections"][train["Census_OSWUAutoUpdateOptionsName"]==val] == 0))    
trace1 = go.Bar(
    x = temp.index,
    y = (temp_y1 / temp.sum()) * 100,
    name='Detected'
)
trace2 = go.Bar(
    x = temp.index,
    y = (temp_y0 / temp.sum()) * 100, 
    name='Not Detected'
)

data = [trace1, trace2]
layout = go.Layout(
    title = "Types of Census_OSWUAutoUpdateOptionsName is HasDetections or not in %",
    #barmode='stack',
    width = 1000,
    xaxis=dict(
        title='',
        tickfont=dict(
            size=14,
            color='rgb(107, 107, 107)'
        )
    ),
    yaxis=dict(
        title='Count in %',
        titlefont=dict(
            size=16,
            color='rgb(107, 107, 107)'
        ),
        tickfont=dict(
            size=14,
            color='rgb(107, 107, 107)'
        )
)
)

fig = go.Figure(data=data, layout=layout)
iplot(fig)

In [None]:
temp = train["Census_GenuineStateName"].value_counts()
#print(temp.values)
temp_y0 = []
temp_y1 = []
for val in temp.index:
    temp_y1.append(np.sum(train["HasDetections"][train["Census_GenuineStateName"]==val] == 1))
    temp_y0.append(np.sum(train["HasDetections"][train["Census_GenuineStateName"]==val] == 0))    
trace1 = go.Bar(
    x = temp.index,
    y = (temp_y1 / temp.sum()) * 100,
    name='Detected'
)
trace2 = go.Bar(
    x = temp.index,
    y = (temp_y0 / temp.sum()) * 100, 
    name='Not Detected'
)

data = [trace1, trace2]
layout = go.Layout(
    title = "Types of Census_GenuineStateName is HasDetections or not in %",
    #barmode='stack',
    width = 1000,
    xaxis=dict(
        title='',
        tickfont=dict(
            size=14,
            color='rgb(107, 107, 107)'
        )
    ),
    yaxis=dict(
        title='Count in %',
        titlefont=dict(
            size=16,
            color='rgb(107, 107, 107)'
        ),
        tickfont=dict(
            size=14,
            color='rgb(107, 107, 107)'
        )
)
)

fig = go.Figure(data=data, layout=layout)
iplot(fig)

In [None]:
temp = train["Census_ActivationChannel"].value_counts()
#print(temp.values)
temp_y0 = []
temp_y1 = []
for val in temp.index:
    temp_y1.append(np.sum(train["HasDetections"][train["Census_ActivationChannel"]==val] == 1))
    temp_y0.append(np.sum(train["HasDetections"][train["Census_ActivationChannel"]==val] == 0))    
trace1 = go.Bar(
    x = temp.index,
    y = (temp_y1 / temp.sum()) * 100,
    name='Detected'
)
trace2 = go.Bar(
    x = temp.index,
    y = (temp_y0 / temp.sum()) * 100, 
    name='Not Detected'
)

data = [trace1, trace2]
layout = go.Layout(
    title = "Types of Census_ActivationChannel is HasDetections or not in %",
    #barmode='stack',
    width = 1000,
    xaxis=dict(
        title='',
        tickfont=dict(
            size=14,
            color='rgb(107, 107, 107)'
        )
    ),
    yaxis=dict(
        title='Count in %',
        titlefont=dict(
            size=16,
            color='rgb(107, 107, 107)'
        ),
        tickfont=dict(
            size=14,
            color='rgb(107, 107, 107)'
        )
)
)

fig = go.Figure(data=data, layout=layout)
iplot(fig)

In [None]:
temp = train["Census_FlightRing"].value_counts()
#print(temp.values)
temp_y0 = []
temp_y1 = []
for val in temp.index:
    temp_y1.append(np.sum(train["HasDetections"][train["Census_FlightRing"]==val] == 1))
    temp_y0.append(np.sum(train["HasDetections"][train["Census_FlightRing"]==val] == 0))    
trace1 = go.Bar(
    x = temp.index,
    y = (temp_y1 / temp.sum()) * 100,
    name='Detected'
)
trace2 = go.Bar(
    x = temp.index,
    y = (temp_y0 / temp.sum()) * 100, 
    name='Not Detected'
)

data = [trace1, trace2]
layout = go.Layout(
    title = "Types of Census_FlightRing is HasDetections or not in %",
    #barmode='stack',
    width = 1000,
    xaxis=dict(
        title='',
        tickfont=dict(
            size=14,
            color='rgb(107, 107, 107)'
        )
    ),
    yaxis=dict(
        title='Count in %',
        titlefont=dict(
            size=16,
            color='rgb(107, 107, 107)'
        ),
        tickfont=dict(
            size=14,
            color='rgb(107, 107, 107)'
        )
)
)

fig = go.Figure(data=data, layout=layout)
iplot(fig)

### Missing Value

In [None]:
# checking missing data
total = train.isnull().sum().sort_values(ascending = False)
percent = (train.isnull().sum()/train.isnull().count()*100).sort_values(ascending = False)
missing_application_train_data  = pd.concat([total, percent], axis=1, keys=['Total', 'Percent'])
missing_application_train_data.head(20)

### I will update relationship between target and features