![](https://signals.numer.ai/homepage-signals/img/signals-logo.png)

Numerai is a hedge fund, investing stocks nearly all over the world. 

As a Japanese, I am curious to see which Japanese stocks got Numerai's appetite:D

# Libraries

In [None]:
!pip install numerapi==2.3.8
import numerapi

In [None]:
!pip install xlrd

In [None]:
# !pip install git+https://github.com/leonhma/yfinance.git #drop-in replacement yfinance fork for failed downloads, h/t ceunen
# !pip install simplejson
# import yfinance
# import simplejson

In [None]:
# This Python 3 environment comes with many helpful analytics libraries installed
# It is defined by the kaggle/python Docker image: https://github.com/kaggle/docker-python
# For example, here's several helpful packages to load

import numpy as np # linear algebra
import pandas as pd # data processing, CSV file I/O (e.g. pd.read_csv)
import gc
import pathlib
from tqdm.auto import tqdm
import json
from multiprocessing import Pool, cpu_count
import time
import requests as re
from datetime import datetime
from dateutil.relativedelta import relativedelta, FR

# Input data files are available in the read-only "../input/" directory
# For example, running this (by clicking run or pressing Shift+Enter) will list all files under the input directory

import os

# visualize
import matplotlib.pyplot as plt
import matplotlib.style as style
from matplotlib_venn import venn2, venn3
import seaborn as sns
from matplotlib import pyplot
from matplotlib.ticker import ScalarFormatter
sns.set_context("talk")
style.use('seaborn-colorblind')

import warnings
warnings.simplefilter('ignore')

# for dirname, _, filenames in os.walk('/kaggle/input'):
#     for filename in filenames:
#         print(os.path.join(dirname, filename))

# You can write up to 20GB to the current directory (/kaggle/working/) that gets preserved as output when you create a version using "Save & Run All" 
# You can also write temporary files to /kaggle/temp/, but they won't be saved outside of the current session

# Today

In [None]:
today = datetime.now().strftime('%Y-%m-%d')
today

# Config

In [None]:
class CFG:
    jpx_path = '../input/japanese-stocks-statistics-from-jpx'
    OUTPUT_DIR = './'

In [None]:
# Logging is always nice for your experiment:)
def init_logger(log_file='train.log'):
    from logging import getLogger, INFO, FileHandler,  Formatter,  StreamHandler
    logger = getLogger(__name__)
    logger.setLevel(INFO)
    handler1 = StreamHandler()
    handler1.setFormatter(Formatter("%(message)s"))
    handler2 = FileHandler(filename=log_file)
    handler2.setFormatter(Formatter("%(message)s"))
    logger.addHandler(handler1)
    logger.addHandler(handler2)
    return logger

logger = init_logger(log_file=f'{CFG.OUTPUT_DIR}/{today}.log')
logger.info(f'Start Logging...today is {today}')

# Get Numerai-Eligible Tickers
This is the universe of tickers that Numerai is putting their money on:D

In [None]:
napi = numerapi.SignalsAPI()
logger.info('numerai api setup!')

In [None]:
# read in list of active Signals tickers which can change slightly era to era
eligible_tickers = pd.Series(napi.ticker_universe(), name='ticker') 
logger.info(f"Number of eligible tickers: {len(eligible_tickers)}")

In [None]:
# read in yahoo to numerai ticker map, still a work in progress, h/t wsouza and 
# this tickermap is a work in progress and not guaranteed to be 100% correct
ticker_map = pd.read_csv('https://numerai-signals-public-data.s3-us-west-2.amazonaws.com/signals_ticker_map_w_bbg.csv')
ticker_map = ticker_map[ticker_map.bloomberg_ticker.isin(eligible_tickers)]

numerai_tickers = ticker_map['bloomberg_ticker']
yfinance_tickers = ticker_map['yahoo']
logger.info(f"Number of eligible tickers in map: {len(ticker_map):,}")

In [None]:
print(ticker_map.shape)
ticker_map.head()

# Get JP tickers from Numerai Signal Traget
This is an easy task: just need to get ones which end with 'JT'.

In [None]:
all_tickers = ticker_map['ticker'].unique().tolist()
jp_tickers = [t for t in all_tickers if t.endswith('JT') and t[0].isdigit()]
logger.info('Among total {:,} tickers of Numerais interest, there are {:,} JP tickers!'.format(
    len(all_tickers), len(jp_tickers))
)


# JP stocks in the Numerai Universe

OK, now we map the ticker to each name of the company! To this end, I use data from the JPX (Japan Exchange Group).

The data are available on their [website](https://www.jpx.co.jp/markets/statistics-equities/misc/06.html), or this [kaggle dataset](https://www.kaggle.com/code1110/japanese-stocks-statistics-from-jpx).

In [None]:
os.listdir(CFG.jpx_path)

In [None]:
# load name - cd mapper
mapper = pd.read_excel(f'{CFG.jpx_path}/data_j.xls')

print(mapper.shape)
mapper.head()

In [None]:
# map
jp_tickers_cd = [int(c.split(' ')[0]) for c in jp_tickers if c[0].isdigit()]

numerai_jps = mapper.loc[mapper['コード'].isin(jp_tickers_cd)]
print(numerai_jps.shape)
numerai_jps.style.background_gradient(cmap='BuGn')

In [None]:
# save
numerai_jps.to_csv(f'{CFG.OUTPUT_DIR}/numerai_jp_stocks.csv', index=False, encoding='utf-8-sig')
logger.info('saved!')

# EDA
Let's perform a simple EDA (Exploratory Data Analysis).

In [None]:
numerai_jps.columns.values.tolist()

In [None]:
numerai_jps['市場・商品区分'].value_counts()

In [None]:
numerai_jps['33業種区分'].value_counts()

In [None]:
numerai_jps['17業種区分'].value_counts()

In [None]:
numerai_jps['規模区分'].value_counts()

So let's have fun with investing Japan:D