Names-Oracle

The names_oracle module is an easy solution to analyze names and last names. It provides information about countries, sex, for first and last names. See examples below to see all the extracted information.

This project was inspired and based on the names-dataset project by Philippe Remy.

Install

install the package:

pip install names-oracle

download the dataset:

python -m names_oracle download_dataset

Note: the dataset (asset from this repository), is around ~600MB so it may take a while.

test:

python -m names_oracle test

How to use

Import:

>>> import names_oracle

The module is composed of just a couple of functions. The most important being:

get_name_info(name : str, country : str) -> Union[Dict[str: float], None]

This function retrieves the data(if present) for the given name for a given country. The list of available countries can be retrieved through the get_available_countries() -> list[str] function. The available countries are:

>>> import names_oracle
>>> names_oracle.get_available_countries()
['AE', 'AF', 'AL', 'ALL', 'AO', 'AR', 'AT', 'AZ', 'BD', 'BE', 'BF', 'BG', 'BH', 'BI', 'BN', 'BO', 'BR', 'BW', 'CA', 'CH', 'CL', 'CM', 'CN', 'CO', 'CR', 'CY', 'CZ', 'DE', 'DJ', 'DK', 'DZ', 'EC', 'EE', 'EG', 'ES', 'ET', 'FI', 'FJ', 'FR', 'GB', 'GE', 'GH', 'GR', 'GT', 'HK', 'HN', 'HR', 'HT', 'HU', 'ID', 'IE', 'IL', 'IN', 'IQ', 'IR', 'IS', 'IT', 'JM', 'JO', 'JP', 'KH', 'KR', 'KW', 'KZ', 'LB', 'LT', 'LU', 'LY', 'MA', 'MD', 'MO', 'MT', 'MU', 'MV', 'MX', 'MY', 'NA', 'NG', 'NL', 'NO', 'OM', 'PA', 'PE', 'PH', 'PL', 'PR', 'PS', 'PT', 'QA', 'RS', 'RU', 'SA', 'SD', 'SE', 'SG', 'SI', 'SV', 'SY', 'TM', 'TN', 'TR', 'TW', 'US', 'UY', 'YE', 'ZA']

If you want to retrieve the data for all countries merged you can use "ALL" as country code.

Note: The first time you retrieve information for a country its database is loaded in memory, this is why the first call is always slow.

Note: The ALL database is the biggest one (~1Gb) as it contains all other databases merged together, therefore it takes some time to load.

>>> from pprint import pprint
>>> import names_oracle
>>> name_data = names_oracle.get_name_info("Andrea", "IT")
>>> pprint(name_data)
{'female_frequency': 11707,
 'female_probability': 0.026372522296236392,
 'first_name_frequency': 443909,
 'first_name_norm_frequency': 0.916561364387182,
 'first_name_probability': 0.9750327274005218,
 'last_name_frequency': 11367,
 'last_name_norm_frequency': 0.08547773382863846,
 'last_name_probability': 0.02496727259947812,
 'male_frequency': 432202,
 'male_probability': 0.9736274777037636,
 'name': 'Andrea'}
>>> name_data = names_oracle.get_name_info("Andrea", "DE")
>>> pprint(name_data)
{'female_frequency': 23848,
 'female_probability': 0.9767765717796436,
 'first_name_frequency': 24415,
 'first_name_norm_frequency': 0.3445964065433092,
 'first_name_probability': 0.9890221177995625,
 'last_name_frequency': 271,
 'last_name_norm_frequency': 0.0062432326583269976,
 'last_name_probability': 0.010977882200437494,
 'male_frequency': 567,
 'male_probability': 0.023223428220356338,
 'name': 'Andrea'}

To determine which part of a name is the first name and which part is the last name you can use the following function:

split_name_in_first_and_last(name, country) -> list

This function returns a string with te first and last name or an empty list if it couldn't determine them.

>>> import names_oracle
>>> name_parts = names_oracle.split_name_in_first_and_last("Leonardo La Rocca", "IT")
>>> print(name_parts)
['Leonardo', 'La Rocca']
>>> name_parts = names_oracle.split_name_in_first_and_last("La Rocca Leonardo", "IT")
>>> print(name_data)
['Leonardo', 'La Rocca']

Sources

The database was generated from the Facebook massive dump (533M users).

You can download the original dataset here.

Name		Name	Last commit message	Last commit date
Latest commit History 12 Commits
names_oracle		names_oracle
scripts		scripts
.gitignore		.gitignore
LICENSE		LICENSE
MANIFEST.in		MANIFEST.in
README.md		README.md
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Names-Oracle

Install

How to use

Sources

About

Releases 1

Packages

Languages

License

leoli51/Names-Oracle

Folders and files

Latest commit

History

Repository files navigation

Names-Oracle

Install

How to use

Sources

About

Resources

License

Stars

Watchers

Forks

Releases 1

Packages 0

Languages

Packages