## <img src="./analytics_logo.jpg" width="80" height="80"/> Mini Project: Predict Gender from Name

In this project I'm going to create a dataframe from scratch with fake data by Faker library, then with NamesDataset library I'll predict the gender of names.

## Prerequisites

To do so, we use the following libraries:
- [`Faker`](https://github.com/joke2k/faker): To generate fake names
- [`names-dataset`](https://github.com/philipperemy/name-dataset): To get gender and country info

In [None]:
!pip install Faker
!pip install names-dataset

# Let's Start

#### import required libraries

In [20]:
import pandas as pd
import numpy as np
from faker import Faker
from faker.providers import internet
from names_dataset import NameDataset, NameWrapper

#### generate fake data by Faker

In [42]:
faker = Faker()

In [65]:
names = np.array([faker.name() for _ in range(1000)], dtype='U31')
names[:10]

array(['David Baker', 'Paige Long', 'Nicole Arias DDS', 'Lisa Brooks',
       'Melvin Bailey', 'Courtney Lynn', 'Dylan Vasquez', 'Keith Shaffer',
       'John Fowler', 'Kevin Johnson'], dtype='<U31')

In [44]:
address = np.array([faker.address() for _ in range(1000)])
address[:10]

array(['781 Laura Drive Apt. 593\nGarciastad, OR 96308',
       '10181 Christopher Forks\nKatelynstad, NH 04549',
       '4141 Connie Roads\nMillerbury, OK 41415',
       '782 Cortez Stream Apt. 139\nNorth Susanview, DE 40247',
       '7334 Nicole Forge Apt. 148\nFieldstown, NE 79103',
       'PSC 5403, Box 9622\nAPO AP 32279',
       '859 Lawrence Freeway\nWest Colechester, KY 03613',
       '804 Hall Run\nEast Juan, TN 06843',
       '93317 Alexandra Courts Apt. 284\nChristopherhaven, PA 02076',
       '8207 Joseph Camp Suite 409\nPatrickland, OH 00949'], dtype='<U62')

In [47]:
comment = np.array([faker.sentence() for _ in range(1000)])
comment[:10]

array(['Money threat whole live east political within.',
       'Clear fast change like yet step.',
       'Cell computer mother manager degree agent whose.',
       'Difficult if must serve.', 'Project candidate line.',
       'Response just today understand coach than her the.',
       'Center company news yard.',
       'Billion professional fire build recent.', 'Add guess your car.',
       'Share too young response each always.'], dtype='<U67')

In [50]:
fake.add_provider(internet)
ip = np.array([fake.ipv4_private() for _ in range(1000)])
ip[:10]

array(['10.128.210.201', '172.17.239.64', '172.29.104.235',
       '10.2.219.223', '172.28.112.53', '192.168.184.9', '10.141.119.31',
       '192.168.63.189', '172.24.44.86', '10.73.221.32'], dtype='<U15')

In [53]:
age = np.random.randint(1,100,1000)
age[:10]

array([16, 55, 20, 11, 51,  4,  5, 94, 65, 72])

In [59]:
status = np.random.randint(0,2,1000,dtype=bool)
status[:10]

array([False,  True, False, False,  True,  True,  True,  True,  True,
        True])

- So, I generate 6 np.array completely fake by Faker. now it's time to create a data frame with them.

In [75]:
df = pd.DataFrame(data=[names, age, address, comment, ip, status])

In [76]:
df = df.T
df

Unnamed: 0,0,1,2,3,4,5
0,David Baker,16,"781 Laura Drive Apt. 593\nGarciastad, OR 96308",Money threat whole live east political within.,10.128.210.201,False
1,Paige Long,55,"10181 Christopher Forks\nKatelynstad, NH 04549",Clear fast change like yet step.,172.17.239.64,True
2,Nicole Arias DDS,20,"4141 Connie Roads\nMillerbury, OK 41415",Cell computer mother manager degree agent whose.,172.29.104.235,False
3,Lisa Brooks,11,"782 Cortez Stream Apt. 139\nNorth Susanview, D...",Difficult if must serve.,10.2.219.223,False
4,Melvin Bailey,51,"7334 Nicole Forge Apt. 148\nFieldstown, NE 79103",Project candidate line.,172.28.112.53,True
...,...,...,...,...,...,...
995,Matthew Owens,45,USCGC Jones\nFPO AP 55681,Respond meeting paper third.,10.244.14.20,True
996,Jamie Crawford,47,"80546 Guerra Brooks Apt. 443\nRiverahaven, CA ...",Common almost only check paper old cup.,10.133.251.246,False
997,Shannon Chandler,27,"9798 Blair Walk\nNorth Pamela, NH 28129",Away bring foreign become son available contro...,10.220.204.70,False
998,Hannah Cooper,20,"309 Morris Land\nEthanbury, MS 32060",Any low coach positive continue between also.,192.168.118.218,True


In [78]:
df.columns = ['name', 'age', 'address', 'comment', 'ip', 'status']

In [79]:
df

Unnamed: 0,name,age,address,comment,ip,status
0,David Baker,16,"781 Laura Drive Apt. 593\nGarciastad, OR 96308",Money threat whole live east political within.,10.128.210.201,False
1,Paige Long,55,"10181 Christopher Forks\nKatelynstad, NH 04549",Clear fast change like yet step.,172.17.239.64,True
2,Nicole Arias DDS,20,"4141 Connie Roads\nMillerbury, OK 41415",Cell computer mother manager degree agent whose.,172.29.104.235,False
3,Lisa Brooks,11,"782 Cortez Stream Apt. 139\nNorth Susanview, D...",Difficult if must serve.,10.2.219.223,False
4,Melvin Bailey,51,"7334 Nicole Forge Apt. 148\nFieldstown, NE 79103",Project candidate line.,172.28.112.53,True
...,...,...,...,...,...,...
995,Matthew Owens,45,USCGC Jones\nFPO AP 55681,Respond meeting paper third.,10.244.14.20,True
996,Jamie Crawford,47,"80546 Guerra Brooks Apt. 443\nRiverahaven, CA ...",Common almost only check paper old cup.,10.133.251.246,False
997,Shannon Chandler,27,"9798 Blair Walk\nNorth Pamela, NH 28129",Away bring foreign become son available contro...,10.220.204.70,False
998,Hannah Cooper,20,"309 Morris Land\nEthanbury, MS 32060",Any low coach positive continue between also.,192.168.118.218,True
