# Scraping the data

**The following code requested SerpAPI ( about 50 requests) and saved the dataframes into csv files in the `data` folder**

**Please do not re-run this code as it will use all the API credits**

**Use the code from section 2 instead to read the data**

## Imports

In [21]:
from serpapi import GoogleSearch
import pandas as pd
import numpy as np
import re

import matplotlib.pyplot as plt
import plotly.graph_objects as go
from plotly.subplots import make_subplots
import plotly.express as px
import plotly.offline as py

from startupjh import serpapi
from startupjh import data_preprocess

## Collecting data using `serpapi.py`


In [4]:
papers_df = serpapi.serpapi_og_results()

Enter key words: automation container terminal


In [6]:
primary_df = serpapi.serpapi_full_cite(papers_df)

In [8]:
citing_papers = serpapi.serpapi_cited_by_list(primary_df)

In [11]:
# Saves the two dataframes to csv files in the `data` folder
primary_df.to_csv('../data/primary_results.csv', index=False)
citing_papers.to_csv('../data/citing_papers.csv', index=False)

# Basic preprocessing

## Loads the data from `csv` files

In [3]:
# primary_df contains the results from the google scholar search
primary_df = pd.read_csv('../data/primary_results.csv')

In [4]:
# citing_df contains the papers that cite every single paper in primary_df
citing_df = pd.read_csv('../data/citing_papers.csv')

## Extract `authors`, `year`, `pub_info` from `primary_df`


In [8]:
extended_primary_df = data_preprocess.extract_pub_info(primary_df)

## Extract `key_words` from `titles` in both dataframes


In [10]:
extracted_primary_df = data_preprocess.extract_key_words(extended_primary_df)

In [12]:
extracted_citing_df = data_preprocess.extract_key_words(citing_df)

# Basic vizualisation

In [16]:
extracted_citing_df.head(15)

Unnamed: 0,paper_id,citing_paper_id,title,result_id,link,snippet,resources_title,resources_link,citation_count,cites_id,versions,cluster_id,key_words
0,0,0,MicroPort: A general simulation platform for s...,o4mmoriNu70J,https://www.sciencedirect.com/science/article/...,Seaport container terminals are essential node...,no data,no data,86,13671676917955594659,4,13671676917955594659,"microport, general, simulation, platform, seap..."
1,1,0,Agent-based simulation of stakeholders relatio...,p-ou00crS34J,https://citeseerx.ist.psu.edu/viewdoc/download...,Port management is often faced with many vexin...,psu.edu,https://citeseerx.ist.psu.edu/viewdoc/download...,75,9100415059517958823,7,9100415059517958823,"agentbased, simulation, stakeholders, relation..."
2,2,0,Agent based simulation architecture for evalua...,bnLMKYOdJ9YJ,https://link.springer.com/article/10.1007/s104...,An agent based simulator for evaluating operat...,sc.edu,https://jmvidal.cse.sc.edu/library/henesey09b.pdf,59,15431475834875834990,11,15431475834875834990,"agent, based, simulation, architecture, evalua..."
3,3,0,Agent based simulation architecture for evalua...,oXp7vE9e2RsJ,https://link.springer.com/chapter/10.1007/1187...,An agent based simulator for evaluating operat...,psu.edu,http://citeseerx.ist.psu.edu/viewdoc/download?...,52,2006738805527902881,16,2006738805527902881,"agent, based, simulation, architecture, evalua..."
4,4,0,Agent-based approaches to transport logistics,m6wq9uLSXOgJ,https://citeseerx.ist.psu.edu/viewdoc/download...,This paper provides a survey of existing resea...,psu.edu,https://citeseerx.ist.psu.edu/viewdoc/download...,43,16743489386891095195,4,16743489386891095195,"agentbased, approaches, transport, logistics"
5,5,0,Simulation and the lean port environment,v7e1Dmpk43sJ,https://link.springer.com/article/10.1057/palg...,This paper considers the applicability of simu...,no data,no data,31,8927089293054556095,6,8927089293054556095,"simulation, lean, port, environment"
6,6,0,A market-based approach to container port term...,sjtE2dT97mYJ,https://www.researchgate.net/profile/Ingo-Timm...,"In sea ports, the management of container term...",researchgate.net,https://www.researchgate.net/profile/Ingo-Timm...,31,7417144726945807282,3,7417144726945807282,"marketbased, approach, container, port, termin..."
7,7,0,Analysing trade‐offs in container loading: com...,edSwUWCSIOMJ,https://onlinelibrary.wiley.com/doi/abs/10.111...,In this paper we describe two operations resea...,exeter.ac.uk,https://ore.exeter.ac.uk/repository/bitstream/...,23,16366241988249441401,4,16366241988249441401,"analysing, trade‐offs, container, loading, com..."
8,8,0,Multiobjective optimization of a port-of-entry...,ZcMjKkz2bGEJ,https://ieeexplore.ieee.org/abstract/document/...,"At the port-of-entry, containers are inspected...",arxiv.org,https://arxiv.org/pdf/1505.05184,23,7020256726148694885,23,7020256726148694885,"multiobjective, optimization, portofentry, ins..."
9,9,0,Container Handling using multi-agent architecture,oyFvORrVEAQJ,https://link.springer.com/chapter/10.1007/978-...,Container terminals are essential intermodal i...,academia.edu,https://www.academia.edu/download/50379242/Con...,19,292968284388532643,7,292968284388532643,"container, handling, using, multiagent, archit..."


In [31]:
# Plots the number of publications per year for extracted_primary_df
fig = go.Figure(data=[go.Bar(x=extracted_primary_df.groupby("year").count()["paper_id"].index, y= extracted_primary_df.groupby("year").count()["paper_id"],
        textposition="outside",
        textangle=0)])


fig.update_layout(title = "Publications per Year - 'automation container terminal'" )

# Update xaxis properties
fig.update_xaxes(title="Year of Publication")

# Update yaxis properties
fig.update_yaxes(title="Number of Publications")

In [48]:
key_words = []
for _, row in extracted_primary_df.iterrows():
    key_word_list = row.key_words.split(',')
    key_words.append(key_word_list)
extracted_primary_df["key_words_list"] = key_words
extracted_primary_df.head(2)

Unnamed: 0,paper_id,title,result_id,link,snippet,resources_title,resources_link,citation_count,cites_id,versions,cluster_id,full_citation,authors,pub_info,year,key_words,key_words_list
0,0,A multi-agent system for the automation of a p...,qljx-SC6MYEJ,https://www.academia.edu/download/48859036/A_m...,This paper presents a system architecture whic...,academia.edu,https://www.academia.edu/download/48859036/A_m...,75,9309426555546589354,4,9309426555546589354,"Rebollo, Miguel, et al. ""A multi-agent system ...","Rebollo, Miguel, et al.",Workshop in Agents in Industry. Barcelona. 2000.,2000,"multiagent, system, automation, port, containe...","[multiagent, system, automation, port, con..."
1,1,Automation in port container terminals,hQkbtalB1OAJ,https://www.sciencedirect.com/science/article/...,… “ Business Process Model and Notation (BPMN)...,sciencedirect.com,https://www.sciencedirect.com/science/article/...,49,16200645956702243205,3,16200645956702243205,"Martín-Soberón, Ana María, et al. ""Automation ...","Martín-Soberón, Ana María, et al.",Procedia-Social and Behavioral Sciences 160 (2...,2014,"automation, port, container, terminals","[automation, port, container, terminals]"


In [56]:
stripped_key_words = []
for _, row in extracted_primary_df.iterrows():
    kw_list = []
    for e in row.key_words_list:
        kw_list.append(e.strip())
    stripped_key_words.append(kw_list)
extracted_primary_df["stripped_kw_list"] = stripped_key_words

In [59]:
extracted_primary_df.stripped_kw_list[1]

['automation', 'port', 'container', 'terminals']

In [62]:
extracted_primary_df.drop(axis=1, columns = ["key_words", "key_words_list"], inplace=True)

In [63]:
extracted_primary_df

Unnamed: 0,paper_id,title,result_id,link,snippet,resources_title,resources_link,citation_count,cites_id,versions,cluster_id,full_citation,authors,pub_info,year,stripped_kw_list
0,0,A multi-agent system for the automation of a p...,qljx-SC6MYEJ,https://www.academia.edu/download/48859036/A_m...,This paper presents a system architecture whic...,academia.edu,https://www.academia.edu/download/48859036/A_m...,75,9309426555546589354,4,9309426555546589354,"Rebollo, Miguel, et al. ""A multi-agent system ...","Rebollo, Miguel, et al.",Workshop in Agents in Industry. Barcelona. 2000.,2000,"[multiagent, system, automation, port, contain..."
1,1,Automation in port container terminals,hQkbtalB1OAJ,https://www.sciencedirect.com/science/article/...,… “ Business Process Model and Notation (BPMN)...,sciencedirect.com,https://www.sciencedirect.com/science/article/...,49,16200645956702243205,3,16200645956702243205,"Martín-Soberón, Ana María, et al. ""Automation ...","Martín-Soberón, Ana María, et al.",Procedia-Social and Behavioral Sciences 160 (2...,2014,"[automation, port, container, terminals]"
2,2,Container port automation,0q7hE6UAsyEJ,https://link.springer.com/content/pdf/10.1007/...,… The Patrick Fisherman's Island terminal is n...,no data,no data,13,2428285333085990610,6,2428285333085990610,"Nelmes, Graeme. ""Container port automation."" F...","Nelmes, Graeme.","Field and Service Robotics. Springer, Berlin, ...",2006,"[container, port, automation]"
3,3,TRACES: TRAFFIC CONTROL ENGINEERING SYSTEM A c...,wgi33W6YplUJ,https://citeseerx.ist.psu.edu/viewdoc/download...,In this study a control system to coordinate t...,psu.edu,https://citeseerx.ist.psu.edu/viewdoc/download...,27,6171787941291428034,6,6171787941291428034,"Duinkerken, Mark B., Joseph JM Evers, and Jaap...","Duinkerken, Mark B., Joseph JM Evers, and Jaap...",signal 2 (1999): v3.,1999,"[traces, traffic, control, engineering, system..."
4,4,Multi-agent system technology in a port contai...,Q4_ut9Yo1yoJ,https://www.researchgate.net/profile/V-Botti/p...,In response to the arrival of a ship (ship age...,researchgate.net,https://www.researchgate.net/profile/V-Botti/p...,13,3086980972259741507,no data,no data,"Botti, Vicent J. ""Multi-agent system technolog...","Botti, Vicent J.",ERCIM News 56 (2004): 37-39.,2004,"[multiagent, system, technology, port, contain..."
5,5,Alignments between strategic content and proce...,CcJLei5JHs4J,https://link.springer.com/article/10.1057/s412...,"During the last three decades, technological i...",springer.com,https://link.springer.com/article/10.1057/s412...,14,14852389085083582985,9,14852389085083582985,"Wang, Ping, Joan P. Mileski, and Qingcheng Zen...","Wang, Ping, Joan P. Mileski, and Qingcheng Zeng.",Maritime Economics & Logistics 21.4 (2019): 54...,2019,"[alignments, strategic, content, process, stru..."
6,6,Performance analysis of a new type of automate...,bIPMB1zTRUsJ,http://citeseerx.ist.psu.edu/viewdoc/download?...,… This paper analyzes the efficient and econom...,psu.edu,http://citeseerx.ist.psu.edu/viewdoc/download?...,8,5423973718458925932,4,5423973718458925932,"Yan, Wei, Yishi Zhu, and Junliang He. ""Perform...","Yan, Wei, Yishi Zhu, and Junliang He.",International Journal of Hybrid Information 7....,2014,"[performance, analysis, new, type, automated, ..."
7,7,A Study on Application of Yard Transportaion E...,pb-3CDRxiGEJ,https://www.koreascience.or.kr/article/JAKO201...,International major container terminals are tr...,koreascience.or.kr,https://www.koreascience.or.kr/article/JAKO201...,5,7027991686810156965,2,7027991686810156965,"Cha, Sang-Hyun, and Chang-Kyun Noh. ""A Study o...","Cha, Sang-Hyun, and Chang-Kyun Noh.",Journal of Navigation and Port Research 42.3 (...,2018,"[study, application, yard, transportaion, equi..."
8,8,Cooperative Scheduling of AGV and ASC in Autom...,T70_WBefKj8J,https://www.hindawi.com/journals/mpe/2021/5764...,The key problem of operation optimization for ...,hindawi.com,https://www.hindawi.com/journals/mpe/2021/5764...,1,4551625296024943951,6,4551625296024943951,"Zhang, Qinglei, et al. ""Cooperative Scheduling...","Zhang, Qinglei, et al.",Mathematical Problems in Engineering 2021 (2021).,2021,"[cooperative, scheduling, agv, asc, automation..."
9,9,New technologies and the transformation of wor...,4q8DAnoCKioJ,https://onlinelibrary.wiley.com/doi/abs/10.111...,… and Sydney terminals. The newly introduced V...,no data,no data,30,3038243621657882594,2,3038243621657882594,"Gekara, Victor Oyaro, and Vi‐Xuan Thanh Nguyen...","Gekara, Victor Oyaro, and Vi‐Xuan Thanh Nguyen.","New Technology, Work and Employment 33.3 (2018...",2018,"[new, technologies, transformation, work, skil..."


In [64]:
all_key_words = []
for _, row in extracted_primary_df.iterrows():
    all_key_words.append(row.stripped_kw_list)
all_key_words

[['multiagent', 'system', 'automation', 'port', 'container', 'terminal'],
 ['automation', 'port', 'container', 'terminals'],
 ['container', 'port', 'automation'],
 ['traces',
  'traffic',
  'control',
  'engineering',
  'system',
  'casestudy',
  'container',
  'terminal',
  'automation'],
 ['multiagent',
  'system',
  'technology',
  'port',
  'container',
  'terminal',
  'automation'],
 ['alignments',
  'strategic',
  'content',
  'process',
  'structure',
  'case',
  'container',
  'terminal',
  'service',
  'process',
  'automation'],
 ['performance',
  'analysis',
  'new',
  'type',
  'automated',
  'container',
  'terminal'],
 ['study',
  'application',
  'yard',
  'transportaion',
  'equipment',
  'automation',
  'system',
  'container',
  'terminal'],
 ['cooperative',
  'scheduling',
  'agv',
  'asc',
  'automation',
  'container',
  'terminal',
  'relay',
  'operation',
  'mode'],
 ['new',
  'technologies',
  'transformation',
  'work',
  'skills',
  'study',
  'computerisatio

In [65]:
flat_kw_list = [item for sublist in all_key_words for item in sublist]
flat_kw_list

['multiagent',
 'system',
 'automation',
 'port',
 'container',
 'terminal',
 'automation',
 'port',
 'container',
 'terminals',
 'container',
 'port',
 'automation',
 'traces',
 'traffic',
 'control',
 'engineering',
 'system',
 'casestudy',
 'container',
 'terminal',
 'automation',
 'multiagent',
 'system',
 'technology',
 'port',
 'container',
 'terminal',
 'automation',
 'alignments',
 'strategic',
 'content',
 'process',
 'structure',
 'case',
 'container',
 'terminal',
 'service',
 'process',
 'automation',
 'performance',
 'analysis',
 'new',
 'type',
 'automated',
 'container',
 'terminal',
 'study',
 'application',
 'yard',
 'transportaion',
 'equipment',
 'automation',
 'system',
 'container',
 'terminal',
 'cooperative',
 'scheduling',
 'agv',
 'asc',
 'automation',
 'container',
 'terminal',
 'relay',
 'operation',
 'mode',
 'new',
 'technologies',
 'transformation',
 'work',
 'skills',
 'study',
 'computerisation',
 'automation',
 'australian',
 'container',
 'terminals']

In [66]:
from collections import Counter

key_words_sorted = Counter(flat_kw_list).most_common()
key_words_sorted

[('container', 10),
 ('automation', 9),
 ('terminal', 7),
 ('system', 4),
 ('port', 4),
 ('multiagent', 2),
 ('terminals', 2),
 ('process', 2),
 ('new', 2),
 ('study', 2),
 ('traces', 1),
 ('traffic', 1),
 ('control', 1),
 ('engineering', 1),
 ('casestudy', 1),
 ('technology', 1),
 ('alignments', 1),
 ('strategic', 1),
 ('content', 1),
 ('structure', 1),
 ('case', 1),
 ('service', 1),
 ('performance', 1),
 ('analysis', 1),
 ('type', 1),
 ('automated', 1),
 ('application', 1),
 ('yard', 1),
 ('transportaion', 1),
 ('equipment', 1),
 ('cooperative', 1),
 ('scheduling', 1),
 ('agv', 1),
 ('asc', 1),
 ('relay', 1),
 ('operation', 1),
 ('mode', 1),
 ('technologies', 1),
 ('transformation', 1),
 ('work', 1),
 ('skills', 1),
 ('computerisation', 1),
 ('australian', 1)]

In [68]:
key_words_df = pd.DataFrame(key_words_sorted, columns=["key_word", "occurence"])
key_words_df.head(5)

Unnamed: 0,key_word,occurence
0,container,10
1,automation,9
2,terminal,7
3,system,4
4,port,4


In [69]:
# Plots the number of publications per year for extracted_primary_df
fig = go.Figure(data=[go.Bar(x=key_words_df["key_word"], y= key_words_df["occurence"],
        textposition="outside",
        textangle=0)])


fig.update_layout(title = "Most common key words - 'automation container terminal'" )

# Update xaxis properties
fig.update_xaxes(title="Key word")

# Update yaxis properties
fig.update_yaxes(title="Number of occurences")

# Other

## Remove duplicates from `citing_df`


In [89]:
# Two papers appear twice, most probably due to different sources
citing_df['title'].duplicated().sum()

2

In [90]:
# These two papers appear twice in citing_df
citing_df.loc[citing_df['title'].duplicated()]

Unnamed: 0,paper_id,citing_paper_id,title,result_id,link,snippet,resources_title,resources_link,citation_count,cites_id,versions,cluster_id
3,3,0,Agent based simulation architecture for evalua...,oXp7vE9e2RsJ,https://link.springer.com/chapter/10.1007/1187...,An agent based simulator for evaluating operat...,psu.edu,http://citeseerx.ist.psu.edu/viewdoc/download?...,52,2006738805527902881,16,2006738805527902881
43,3,4,Blockchain applications and architectures for ...,U_RW57CJHoYJ,https://www.sciencedirect.com/science/article/...,Efficient port logistic operations and managem...,no data,no data,7,9664313243272148051,3,9664313243272148051


In [91]:
# Let's drop them
citing_df.drop(axis=0, index=[3, 43], inplace=True)

In [92]:
# Check that the duplicate papers have been removed OK
citing_df['title'].duplicated().sum()

0

In [93]:
citing_df.reset_index(drop=True, inplace=True)

In [94]:
citing_df

Unnamed: 0,paper_id,citing_paper_id,title,result_id,link,snippet,resources_title,resources_link,citation_count,cites_id,versions,cluster_id
0,0,0,MicroPort: A general simulation platform for s...,o4mmoriNu70J,https://www.sciencedirect.com/science/article/...,Seaport container terminals are essential node...,no data,no data,86,13671676917955594659,4,13671676917955594659
1,1,0,Agent-based simulation of stakeholders relatio...,p-ou00crS34J,https://citeseerx.ist.psu.edu/viewdoc/download...,Port management is often faced with many vexin...,psu.edu,https://citeseerx.ist.psu.edu/viewdoc/download...,75,9100415059517958823,7,9100415059517958823
2,2,0,Agent based simulation architecture for evalua...,bnLMKYOdJ9YJ,https://link.springer.com/article/10.1007/s104...,An agent based simulator for evaluating operat...,sc.edu,https://jmvidal.cse.sc.edu/library/henesey09b.pdf,59,15431475834875834990,11,15431475834875834990
3,4,0,Agent-based approaches to transport logistics,m6wq9uLSXOgJ,https://citeseerx.ist.psu.edu/viewdoc/download...,This paper provides a survey of existing resea...,psu.edu,https://citeseerx.ist.psu.edu/viewdoc/download...,43,16743489386891095195,4,16743489386891095195
4,5,0,Simulation and the lean port environment,v7e1Dmpk43sJ,https://link.springer.com/article/10.1057/palg...,This paper considers the applicability of simu...,no data,no data,31,8927089293054556095,6,8927089293054556095
...,...,...,...,...,...,...,...,...,...,...,...,...
77,5,9,Work in progress: barriers and concerns of eld...,3K84JhHMHmMJ,https://link.springer.com/chapter/10.1007/978-...,Digital transformation of work is in progress ...,no data,no data,5,7142370433083944924,no data,no data
78,6,9,The role of collective bargaining in a digitiz...,uQmUnSWbrrYJ,http://lerachapters.org/OJS/ojs-2.4.4-1/index....,It is all too common these days to read headli...,lerachapters.org,http://lerachapters.org/OJS/ojs-2.4.4-1/index....,5,13163629346710358457,2,13163629346710358457
79,7,9,Bridging the Skill Gap in Robotics: Global and...,9jbaQgnutcEJ,https://journals.sagepub.com/doi/abs/10.1177/2...,This article focuses on the demand for skills ...,sagepub.com,https://journals.sagepub.com/doi/pdf/10.1177/2...,5,13958324343648433910,4,13958324343648433910
80,8,9,Understanding the role of employees in digital...,LrjxtsJ5d7oJ,https://www.emerald.com/insight/content/doi/10...,Purpose Much of recent academic and profession...,no data,no data,6,13436341890483075118,4,13436341890483075118


## Fix `paper_id` to make sure each `paper_id` is uniqu`


In [97]:
paper_id = []
for i in range(len(citing_df)):
    paper_id.append(i)

In [98]:
citing_df["paper_id"] = paper_id
citing_df

Unnamed: 0,paper_id,citing_paper_id,title,result_id,link,snippet,resources_title,resources_link,citation_count,cites_id,versions,cluster_id
0,0,0,MicroPort: A general simulation platform for s...,o4mmoriNu70J,https://www.sciencedirect.com/science/article/...,Seaport container terminals are essential node...,no data,no data,86,13671676917955594659,4,13671676917955594659
1,1,0,Agent-based simulation of stakeholders relatio...,p-ou00crS34J,https://citeseerx.ist.psu.edu/viewdoc/download...,Port management is often faced with many vexin...,psu.edu,https://citeseerx.ist.psu.edu/viewdoc/download...,75,9100415059517958823,7,9100415059517958823
2,2,0,Agent based simulation architecture for evalua...,bnLMKYOdJ9YJ,https://link.springer.com/article/10.1007/s104...,An agent based simulator for evaluating operat...,sc.edu,https://jmvidal.cse.sc.edu/library/henesey09b.pdf,59,15431475834875834990,11,15431475834875834990
3,3,0,Agent-based approaches to transport logistics,m6wq9uLSXOgJ,https://citeseerx.ist.psu.edu/viewdoc/download...,This paper provides a survey of existing resea...,psu.edu,https://citeseerx.ist.psu.edu/viewdoc/download...,43,16743489386891095195,4,16743489386891095195
4,4,0,Simulation and the lean port environment,v7e1Dmpk43sJ,https://link.springer.com/article/10.1057/palg...,This paper considers the applicability of simu...,no data,no data,31,8927089293054556095,6,8927089293054556095
...,...,...,...,...,...,...,...,...,...,...,...,...
77,77,9,Work in progress: barriers and concerns of eld...,3K84JhHMHmMJ,https://link.springer.com/chapter/10.1007/978-...,Digital transformation of work is in progress ...,no data,no data,5,7142370433083944924,no data,no data
78,78,9,The role of collective bargaining in a digitiz...,uQmUnSWbrrYJ,http://lerachapters.org/OJS/ojs-2.4.4-1/index....,It is all too common these days to read headli...,lerachapters.org,http://lerachapters.org/OJS/ojs-2.4.4-1/index....,5,13163629346710358457,2,13163629346710358457
79,79,9,Bridging the Skill Gap in Robotics: Global and...,9jbaQgnutcEJ,https://journals.sagepub.com/doi/abs/10.1177/2...,This article focuses on the demand for skills ...,sagepub.com,https://journals.sagepub.com/doi/pdf/10.1177/2...,5,13958324343648433910,4,13958324343648433910
80,80,9,Understanding the role of employees in digital...,LrjxtsJ5d7oJ,https://www.emerald.com/insight/content/doi/10...,Purpose Much of recent academic and profession...,no data,no data,6,13436341890483075118,4,13436341890483075118


## Create SQL database from the two "relational" dataframes

In [118]:
import sqlite3

conn = sqlite3.connect('papers_database')
c = conn.cursor()

c.execute('CREATE TABLE IF NOT EXISTS citingPapers (paper_id number, citing_paper_id number, title text, result_id text, link text, snippet text, resources_title text, resources_link text, citation_count number, cites_id text, versions text, cluster_id text)')
conn.commit()

citing_df.to_sql('citingPapers', conn, if_exists='replace', index = False)

In [102]:
# First we need to convert all `uint64` to `int64`
primary_df.dtypes

paper_id            int64
title              object
result_id          object
link               object
snippet            object
resources_title    object
resources_link     object
citation_count      int64
cites_id           uint64
versions           object
cluster_id         object
full_citation      object
authors            object
pub_info           object
year               object
dtype: object

In [107]:
primary_df["cites_id"] = primary_df["cites_id"].astype(np.int64)

In [112]:
# Now on to citing_df, all types are good
citing_df.dtypes

paper_id            int64
citing_paper_id     int64
title              object
result_id          object
link               object
snippet            object
resources_title    object
resources_link     object
citation_count      int64
cites_id           object
versions           object
cluster_id         object
dtype: object

In [119]:
c.execute('''  
        SELECT * FROM citingPapers
        WHERE citing_paper_id = 4
          ''')

<sqlite3.Cursor at 0x129c06110>

In [120]:
df = pd.DataFrame(c.fetchall())    
print (df)

   0   1                                                  2             3   \
0  39   4  An agent-based solution for the berth allocati...  tW2BTadnl9UJ   
1  40   4  A multi-agent system for container terminal ma...  FIGR_pEPI6IJ   
2  41   4        Agent-based container terminal optimisation  OsmmhJM3uI4J   
3  42   4                           基于 MAS 的集装箱自动化码头协同作业系统模型  QU3Et9QnREcJ   
4  43   4                        集装箱堆场收发箱管理 Multi—Agent 系统研究  H7zmDzkOxbcJ   
5  44   4  Automatic Damage-Detecting System for Port Con...  5p0VMZGeMUcJ   
6  45   4        Agent-Based Container Terminal Optimisation  9GwPOa9xTzcJ   
7  46   4  Container Number Recognition Method Based on S...  T_cGg-fvK1AJ   
8  47   4  Research on Automatic Control System of Bulk T...  e5Yy7J_XF78J   

                                                  4   \
0  http://univagora.ro/jour/index.php/ijccc/artic...   
1  https://ieeexplore.ieee.org/abstract/document/...   
2  https://link.springer.com/chapter/10.1007/978-..