In [24]:
import pandas as pd

## Obtain list of URLs for patents from Google Patents

I have searched for keyword "autonomous vehicles" in patents.google.com, and date-limited the search to patents filed in the US from the year 2012. The link to this search result is provided below.

https://patents.google.com/?q=autonomous+vehicles&country=US&after=priority:20120101

In [25]:
# I am loading the csv file generated in the patents.google.com website
# I am  assigning column names
file_path = "./data/autonomous_full.csv"
column_names = [
    "id",
    "title",
    "assignee",
    "inventor/author",
    "priority date",
    "filing/creation date",
    "publication date",
    "grant date",
    "result link",
]
df = pd.read_csv(file_path, sep=",", names=column_names)

Get the first 3 rows using `head(n)`

In [26]:
df.head(3)

Unnamed: 0,id,title,assignee,inventor/author,priority date,filing/creation date,publication date,grant date,result link
0,search URL:,https://patents.google.com/?q=autonomous+vehic...,,,,,,,
1,id,title,assignee,inventor/author,priority date,filing/creation date,publication date,grant date,result link
2,US-2019204842-A1,Trajectory planner with dynamic cost learning ...,GM Global Technology Operations LLC,"Sayyed Rouhollah Jafari Tafti, Guangyu J. Zou,...",2018-01-02,2018-01-02,2019-07-04,,https://patents.google.com/patent/US2019020484...


In [27]:
# The DataFrame contains two rows describing the csv
# Removing the two rows to prepare a clean DataFrame ready for analysis
df = df[2:]
df = df.reset_index(drop=True)


In [28]:
df.head(3)

Unnamed: 0,id,title,assignee,inventor/author,priority date,filing/creation date,publication date,grant date,result link
0,US-2019204842-A1,Trajectory planner with dynamic cost learning ...,GM Global Technology Operations LLC,"Sayyed Rouhollah Jafari Tafti, Guangyu J. Zou,...",2018-01-02,2018-01-02,2019-07-04,,https://patents.google.com/patent/US2019020484...
1,US-9315178-B1,Model checking for autonomous vehicles,Google Inc.,"David I. Ferguson, Dmitri A. Dolgov, Christoph...",2012-04-13,2012-04-13,2016-04-19,2016-04-19,https://patents.google.com/patent/US9315178B1/en
2,US-2018201256-A1,Autonomous parking of vehicles inperpendicular...,"Ford Global Technologies, Llc","Eric Hongtei Tseng, Li Xu, Kyle Simmons, Dougl...",2017-01-13,2017-01-13,2018-07-19,,https://patents.google.com/patent/US2018020125...


#### Data types

In [29]:
df.dtypes

id                      object
title                   object
assignee                object
inventor/author         object
priority date           object
filing/creation date    object
publication date        object
grant date              object
result link             object
dtype: object

#### Describe

In [30]:
df.describe()

Unnamed: 0,id,title,assignee,inventor/author,priority date,filing/creation date,publication date,grant date,result link
count,25000,25000,25000,25000,24934,25000,25000,16319,25000
unique,25000,23367,6595,21421,2551,2774,881,432,25000
top,US-2019204842-A1,"Vehicle control device, vehicle control method...","Ford Global Technologies, Llc",Christopher P. Ricci,2013-03-15,2017-09-29,2020-10-27,2020-10-27,https://patents.google.com/patent/US2019020484...
freq,1,62,972,30,163,56,94,94,1


---

### Pandas Output files

- Saving Pickle (Python format)

In [31]:
output_filepath = "./data/outputs/autonomous.pkl"
df.to_pickle(output_filepath)

### Other types

There are some other ways to save and read a pandas Dataframe.

The most used ones are:
- CSV (`.csv`): Comma separated value (doesn't preserve data types)
    - `df.to_csv()` and `pd.read_csv()`
- Parquet (`.parquet`): Apache format (columnar)
    - `df.to_pickle()` and `pd.read_parquet()`
    


In [32]:
output_filepath = "./data/outputs/autonomous.csv"
df.to_csv(output_filepath)