# Analysis: Stock Screener (from NASDAQ)

- Load data.
- Select columns.
- Format columns.
- Text columns processing.
- Analysis.
- Save into processed data folder.

### Reference:

- [nasdaq.com: Stock Screener](https://www.nasdaq.com/market-activity/stocks/screener)

In [1]:
import os
import sys
sys.path.append('../../')
import pandas as pd
import config
from cleantext import clean

### arguments

In [2]:
l_columns = ['Symbol', 'Name', 'Market Cap', 'Country', 'IPO Year', 'Sector', 'Industry']
d_types_columns = {'Symbol':str, 'Name':str, 'Market Cap':float, 'Country':str, 'IPO Year':float, 'Sector':str, 'Industry':str}
d_rename_columns = {'Symbol':"symbol", 'Name':"name", 'Market Cap':"market_cap", 'Country':"country", 
                    'IPO Year':"year", 'Sector':"sector", 'Industry':"industry"}

# load data

In [3]:
# create path
path_input = os.path.join(config.folder_project, config.folder_external, config.file_stock_screener)
# load data
df = pd.read_csv(path_input, usecols = l_columns, dtype = d_types_columns)
# rename columns 
df.rename(columns = d_rename_columns, inplace = True)
# display
df.shape

(7503, 7)

# data processing

In [4]:
# clean company name column
df["name"] = df["name"].apply(lambda x: clean(x, no_punct = True)) 
# clean country name
df["country"] = df["country"].apply(lambda x: clean(x, no_punct = True) if type(x) == str else x) 
# clean sector
df["sector"] = df["sector"].apply(lambda x: clean(x, no_punct = True) if type(x) == str else x) 
# clean country name
df["industry"] = df["industry"].apply(lambda x: clean(x, no_punct = True) if type(x) == str else x)


# analyze categorical variables

In [5]:
# country
df["country"].dropna().unique()

array(['united states', 'china', 'canada', 'united kingdom', 'brazil',
       'hong kong', 'bermuda', 'switzerland', 'ireland', 'netherlands',
       'singapore', 'germany', 'gibraltar', 'luxembourg', 'australia',
       'chile', 'israel', 'sweden', 'mexico', 'united arab emirates',
       'taiwan', 'malaysia', 'argentina', 'denmark', 'south africa',
       'france', 'japan', 'peru', 'spain', 'cayman islands', 'panama',
       'belgium', 'new zealand', 'colombia', 'greece', 'cyprus',
       'south korea', 'jersey', 'uruguay', 'guernsey', 'macau', 'italy',
       'norway', 'costa rica', 'puerto rico', 'kazakhstan', 'monaco',
       'malta', 'india', 'turkey', 'jordan', 'finland', 'isle of man',
       'curacao', 'bahamas', 'philippines', 'indonesia', 'thailand'],
      dtype=object)

In [6]:
# country
df["sector"].dropna().unique()

array(['industrials', 'consumer discretionary', 'finance', 'health care',
       'real estate', 'miscellaneous', 'technology', 'consumer staples',
       'energy', 'utilities', 'basic materials', 'telecommunications'],
      dtype=object)

In [7]:
# country
df["industry"].dropna().value_counts()

industry
biotechnology pharmaceutical preparations        650
blank checks                                     559
major banks                                      327
computer software prepackaged software           242
real estate investment trusts                    237
                                                ... 
diversified electronic products                    1
assisted living services                           1
tobacco                                            1
general bldg contractors nonresidential bldgs      1
toolshardware                                      1
Name: count, Length: 150, dtype: int64

# analyze availability

In [22]:
df[df.name.str.contains("santander")]

Unnamed: 0,symbol,name,market_cap,country,year,sector,industry
1133,BSAC,banco santander chile ads,9054836000.0,chile,,finance,commercial banks
1135,BSBR,banco santander brasil sa american depositary ...,40601690000.0,brazil,2009.0,finance,commercial banks
5961,SAN,banco santander sa sponsored adr spain,63818730000.0,spain,,finance,commercial banks


## Conclusiones

Viendo este ejemplo y otros, me temo que esta lista no vale tanto.