In [1]:
# Install latest version from GitHub
!pip install -q -U git+https://github.com/jdvelasq/techminer

# Keyword completation

## Data loading

In this step, the records for the study are selected. The previous file is loaded with:

In [2]:
import pandas as pd

df = pd.read_json(
    "https://raw.githubusercontent.com/jdvelasq/techminer/master/data/tutorial/"
    + "keyword-completation.json",
    orient="records",
    lines=True,
)

`NaN` values are changed by `None`.

In [3]:
df = df.applymap(lambda x: None if pd.isna(x) is True else x)

## Keywords completation

This step aims to create a column (field) in the dataframe containing key terms for document selection. The columns `'Author Keywords'` and `'Index Keywords'` are joined in a new column is called `'Keywords'`.

In [4]:
from techminer import DataFrame

df = DataFrame(df).keywords_fusion()

In [5]:
#
# New column Keywords:
#
df.Keywords

0      Component trends;Empirical mode decomposition;...
1      Consumer price index;Costs;Distributed represe...
2      Algorithms;And financial time series predictio...
3      Artificial intelligence;Auto-regressive exogen...
4      Commerce;Deep learning;Electronic trading;Fina...
                             ...                        
147                                                 None
148                                                 None
149                                                 None
150                                                 None
151                                                 None
Name: Keywords, Length: 152, dtype: object

However, there are records without `'Author Keywords'` and `'Index Keywords'`.

In [6]:
len(df[df.Keywords.map(lambda x: x is None)])

8

In [7]:
#
# Verification:
#
df.Keywords[
    (df["Author Keywords"].map(lambda x: x is None))
    & (df["Index Keywords"].map(lambda x: x is None))
]

144    None
145    None
146    None
147    None
148    None
149    None
150    None
151    None
Name: Keywords, dtype: object

In [8]:
#
# Complete keywords based on abstract and title
#
df = DataFrame(df).keywords_completation()

In [9]:
#
# Verify the number of rows without keywords
#
len(df[df.Keywords.map(lambda x: x is None)])

0

In [10]:
### df.to_json("deletion-of-records.json", orient="records", lines=True)