# Creating DataFrames

#### Good ol PYTHONPATH fix, details [here](https://stackoverflow.com/questions/16114391/adding-directory-to-sys-path-pythonpath)

In [1]:
import sys
import os
sys.path.insert(0, os.path.join(os.getcwd(), "../"))

In [2]:
import pnguin as pg

## Creating some mock data

In [5]:
ld = [{"name": "Raghav", "info": "idk"}, {"name": "Someone Else", "info": "Something else?"}]
dl = {"name": ["Raghav", "Someone Else"], "info": ["idk", "Something else?"]}
header_csv = os.path.join(os.getcwd(), "../test/data/headerdata.csv")
no_header_csv = os.path.join(os.getcwd(), "../test/data/noheaderdata.csv")

### pnguin supports two primary methodologies to create a DataFrame -- from a variable, or from a CSV file. I've decided to stop with these two for now because to be honest, they're the two most common data sources I use. If you would like to introduce another source, please feel free to open a [PR](https://github.com/raghavmecheri/pnguin/pulls)

Note: The point of this library is to keep it lightweight and minimalist! So if you want an alternate datasource that already can pull data out as say, a list of dicts (or a dict of lists!) an extra line to your source code isn't always the worst thing :)

## From a list of dicts (or a dict of lists!)

In [6]:
df = pg.DataFrame(ld, axis="row")

In [7]:
df.head()

name          info
------------  ---------------
Raghav        idk
Someone Else  Something else?

In [8]:
df = pg.DataFrame(dl, axis="col")

In [9]:
df.head()

name          info
------------  ---------------
Raghav        idk
Someone Else  Something else?

### You can also give pnguin data in row format (list of dicts) and specify a column-axis -- it'll just convert the data for you!

In [10]:
df = pg.DataFrame(ld, axis="col")

In [11]:
df

name          info
------------  ---------------
Raghav        idk
Someone Else  Something else?

## From a CSV file

### CSV files both with and without headers work -- pnguin assumes that the first row's a header

### If you have a CSV file with headers, it's easy

In [14]:
df = pg.read_csv(header_csv)

In [15]:
df.head()

﻿Name      Age  Something    Something2    Something3
-------  -----  -----------  ------------  ------------
Raghav      20  a            b             c
Raghav2     21  a            b             c
Raghav3     19  a            d             c
Raghav4     22  d            b             c
Raghav5     21  aaaa         b             e

### If you have a CSV file without headers, you can also pass an array of headers in

In [16]:
df = pg.read_csv(no_header_csv, headers=["Name", "Age", "Something", "Something2", "Something3"])

In [17]:
df.head()

Name       Age  Something    Something2    Something3
-------  -----  -----------  ------------  ------------
﻿Raghav     20  a            b             c
Raghav2     21  a            b             c
Raghav3     19  a            d             c
Raghav4     22  d            b             c
Raghav5     21  aaaa         b             e