# Data Exploration

## Instructions
There are some data files related to transaction saved under the [data](../data) folder:
- Looking into the data using appropriate functions and extract the fields in the data.
- For each data, describe what the data is about and what fields are saved.

You need to answer the questions and perform the task below:
- How many transactions are in GBP?
- How many transactions are NOT in USD?
- What is the average and mediam transaction in USD?
- Construct a table showing the number of transactions in EACH currency

Note:
- You are NOT ALLOWED to import other library or package
- You can write you own functions
- Your answers should be readable with approprate comments
- You can refer to [markdown cheatsheet](https://github.com/adam-p/markdown-here/wiki/Markdown-Cheatsheet) if you are not familar with Markdown

## Import libraries 

In [1]:
# Usual libraries are imported here
import os
import yaml
import dask.dataframe as dd
import pandas as pd
import matplotlib
import numpy as np
import matplotlib.pyplot as plt
%matplotlib inline

## Please perform your tasks below and answer the questions

### countries
The "countries.csv" is about the information of 226 countries, including Code, Code3, Numcode and Phonecode.

In [20]:
countries = pd.read_csv('../data/countries.csv')
print(len(countries))
countries.head(5)

226


Unnamed: 0,CODE,NAME,CODE3,NUMCODE,PHONECODE
0,AF,Afghanistan,AFG,4,93
1,AL,Albania,ALB,8,355
2,DZ,Algeria,DZA,12,213
3,AS,American Samoa,ASM,16,1684
4,AO,Angola,AGO,24,244


### currency
The "currency.csv" is about the information of 184 currency, including its Exponent and if it is crypto.

In [21]:
currency = pd.read_csv('../data/currency_details.csv')
print(len(currency))
currency.head(5)

184


Unnamed: 0,CCY,EXPONENT,IS_CRYPTO
0,AED,2,False
1,AFN,2,False
2,ALL,2,False
3,AMD,2,False
4,ANG,2,False


### fraudsters
The "countries.csv" includes the user id of 298 fraudsters.

In [19]:
fraudsters = pd.read_csv('../data/fraudsters.csv')
print(len(fraudsters))
fraudsters.head(5)

298


Unnamed: 0,USER_ID
0,5270b0f4-2e4a-4ec9-8648-2135312ac1c4
1,848fc1b1-096c-40f7-b04a-1399c469e421
2,27c76eda-e159-4df3-845a-e13f4e28a8b5
3,a27088ef-9452-403d-9bbb-f7b10180cdda
4,fb23710b-609a-49bf-8a9a-be49c59ce6de


### german_credit_data
The "german_credit_datas.csv" is about information of 1000 users who take a credit loan from a bank, including their Age, Sex, Job, Housing, Saving accounts, Checking account, Credit amount, Duration and Purpose.

In [28]:
credit = pd.read_csv('../data/german_credit_data.csv')
print(len(credit))
credit.head(5)

1000


Unnamed: 0.1,Unnamed: 0,Age,Sex,Job,Housing,Saving accounts,Checking account,Credit amount,Duration,Purpose
0,0,67,male,2,own,,little,1169,6,radio/TV
1,1,22,female,2,own,little,moderate,5951,48,radio/TV
2,2,49,male,1,own,little,,2096,12,education
3,3,45,male,2,free,little,little,7882,42,furniture/equipment
4,4,53,male,2,free,little,little,4870,24,car


### users
The "users.csv" is about information of 9944 users, including their ID, Phone country, Terms version, Created date, State, Country, Birth year, KYC, if they have email, if they are fraudsters and how many times they failed sign in attempts.

In [29]:
users = pd.read_csv('../data/users.csv')
print(len(users))
users.head(5)

9944


Unnamed: 0,ID,HAS_EMAIL,PHONE_COUNTRY,IS_FRAUDSTER,TERMS_VERSION,CREATED_DATE,STATE,COUNTRY,BIRTH_YEAR,KYC,FAILED_SIGN_IN_ATTEMPTS
0,1872820f-e3ac-4c02-bdc7-727897b60043,1,GB||JE||IM||GG,False,2018-05-25,2017-08-06 07:33:33.341000,ACTIVE,GB,1971,PASSED,0
1,545ff94d-66f8-4bea-b398-84425fb2301e,1,GB||JE||IM||GG,False,2018-01-01,2017-03-07 10:18:59.427000,ACTIVE,GB,1982,PASSED,0
2,10376f1a-a28a-4885-8daa-c8ca496026bb,1,ES,False,2018-09-20,2018-05-31 04:41:24.672000,ACTIVE,ES,1973,PASSED,0
3,fd308db7-0753-4377-879f-6ecf2af14e4f,1,FR,False,2018-05-25,2018-06-01 17:24:23.852000,ACTIVE,FR,1986,PASSED,0
4,755fe256-a34d-4853-b7ca-d9bb991a86d3,1,GB||JE||IM||GG,False,2018-09-20,2017-08-09 15:03:33.945000,ACTIVE,GB,1989,PASSED,0


### transactions
The "transactions.csv" is records of 688651 transactions, including the Currency, Ammount, State, Created date, Merchant category, Merchant country, Entry method, User ID, Type, Source, ID and Amount USD.

In [30]:
transactions = pd.read_csv('../data/transactions.csv')
print(len(transactions))
transactions.head(5)

688651


Unnamed: 0,CURRENCY,AMOUNT,STATE,CREATED_DATE,MERCHANT_CATEGORY,MERCHANT_COUNTRY,ENTRY_METHOD,USER_ID,TYPE,SOURCE,ID,AMOUNT_USD
0,GBP,175,COMPLETED,2017-12-20 12:46:20.294,cafe,GBR,cont,8f99c254-7cf2-4e35-b7e4-53804d42445d,CARD_PAYMENT,GAIA,b3332e6f-7865-4d6e-b6a5-370bc75568d8,220
1,EUR,2593,COMPLETED,2017-12-20 12:38:47.232,bar,AUS,cont,ed773c34-2b83-4f70-a691-6a7aa1cb9f11,CARD_PAYMENT,GAIA,853d9ff8-a007-40ef-91a2-7d81e29a309a,2885
2,EUR,1077,COMPLETED,2017-12-20 12:34:39.668,,CZE,cont,eb349cc1-e986-4bf4-bb75-72280a7b8680,CARD_PAYMENT,GAIA,04de8238-7828-4e46-91f1-050a9aa7a9df,1198
3,GBP,198,COMPLETED,2017-12-20 12:45:50.555,supermarket,GBR,cont,dc78fbc4-c936-45d3-a813-e2477ac6d74b,CARD_PAYMENT,GAIA,2b790b9b-c312-4098-a4b3-4830fc8cda53,249
4,EUR,990,COMPLETED,2017-12-20 12:45:32.722,,FRA,cont,32958a5c-2532-42f7-94f9-127f2a812a55,CARD_PAYMENT,GAIA,6469fc3a-e535-41e9-91b9-acb46d1cc65d,1101


### Questions
- How many transactions are in GBP?
- How many transactions are NOT in USD?
- What is the average and mediam transaction in USD?
- Construct a table showing the number of transactions in EACH currency

In [47]:
print ('There are', len(transactions[transactions['CURRENCY'] == 'GBP']), 'trasactions in GBP.')
print ('There are', len(transactions[transactions['CURRENCY'] != 'USD']), 'trasactions not in USD.')
print ('The average transaction in USD is', np.mean(transactions[transactions['CURRENCY'] == 'USD'].AMOUNT), '.')
print ('The median transaction in USD is', np.median(transactions[transactions['CURRENCY'] == 'USD'].AMOUNT), '.')

There are 339091 trasactions in GBP.
There are 657109 trasactions not in USD.
The average transaction in USD is 11598.75470800837 .
The median transaction in USD is 2000.0 .
