# Using Hypothesis to Generate Random Pandas DataFrames
When I started working with [hypothesis](https://hypothesis.readthedocs.io), I couldn't find any examples that showed how to generate a pandas DataFrame.  The following is my solution.  It is meant as a starting point.

In [1]:
import hypothesis.strategies as st
from hypothesis import given
from hypothesis.extra.datetime import dates
from functools import partial
import string
import pandas as pd

## Column Strategy
First I defined a strategy to generate dictionaries whose values are all python lists of the same length.  The generated dictionary will be passed to pandas to create the dataframe.

In [2]:
#return a dictionary whose entries all contain random integers in lists of length N. 
@st.cacheable
@st.defines_strategy
def column_st(n):
    return st.fixed_dictionaries({
                           'A':st.lists(st.integers(), min_size=n, max_size=n),
                           'B':st.lists(st.floats(), min_size=n, max_size=n),
                           })




## Timeseries Strategy
A second strategy uses the column strategy to create the dataframe with a date index.  The third parameter to build is a strategy the chooses the number of rows for the dataframe in this case between 1 and 10 rows.  Then it passes the number of rows to use the column strategy.  The lambda takes a random date to use as a starting date and the generated dictionary and constructs a dataframe.

In [3]:
@st.cacheable
@st.defines_strategy
def timeseries(column_strategy,min_year=2011, max_year=2020):
    return st.builds(        
        lambda x,y: pd.DataFrame(y).set_index(pd.date_range(x,periods=len(y.values()[0]))),
        dates(min_year=min_year, max_year=max_year),
        st.integers(min_value=1,max_value=10).flatmap(column_strategy))


In [4]:
timeseries(column_st).example()

Unnamed: 0,A,B
2018-03-10,2124305603451444508,
2018-03-11,-4,-8.737824e+18
2018-03-12,1,-0.5
2018-03-13,459,3.811456e+17
2018-03-14,-2161139410193663,-1.435188e+18


## Generate Random Columns
This is just a second strategy which generates columns with random names

In [5]:


#return a dictionary with n columns, with random labels
@st.cacheable
@st.defines_strategy
def random_column_st(n,columns):
    return st.dictionaries(st.text(string.ascii_uppercase,min_size=3,max_size=6), 
                           values=st.lists(st.integers(min_value=0,max_value=100), min_size=n, max_size=n),
                           min_size=columns,max_size=columns)


In [6]:
timeseries(partial(random_column_st,columns=5)).example()

Unnamed: 0,CICKJ,CLZIWK,DNISVB,MABOLR,ZNQROX
2014-05-24,84,78,92,22,34
2014-05-25,16,47,84,68,96
2014-05-26,40,98,42,80,59
2014-05-27,44,94,72,0,80
2014-05-28,55,30,48,89,36
2014-05-29,30,34,3,90,82
2014-05-30,77,78,34,3,43
