## Installing the provided library

In [1]:
! cd ../dist/ && pip3 install QPandas-0.20-py3-none-any.whl && cd ../doc/

Processing ./QPandas-0.20-py3-none-any.whl
QPandas is already installed with the same version as the provided wheel. Use --force-reinstall to force an installation of the wheel.


In [2]:
from QPandas.qframe import QFrame, QSeries

## Series
We will provide some simple use cases that reflect the required functionality asked in doc. The API is similar to the Pandas pd.Series(see https://pandas.pydata.org/docs/reference/api/pandas.Series.html). In the next cells we will show the following:
- Creating a series
- Accessing attributes, entries and filtering based on conditions
- Combining the series in a frame

In [5]:
#Setup a common idx for the next series
idx = list(range(4))
#Create series using list (comprehension)
t = QSeries(data= [True  if i%2==0 else False for i in range(len(idx))], index=idx,dtype=bool,name="allbool")
price = QSeries([i**2/2 for i in range(4)],idx,float,name="current_price")
#Series representation
print(t)
print(price)
print(price[t])
print(t[t==True])

'alltrue'
__________________
|  0   |  True  |
__________________
|  1   |  False  |
__________________
|  2   |  True  |
__________________
|  3   |  False  |
__________________
dtype: <class 'bool'>
'current_price'
__________________
|  0   |  0.0  |
__________________
|  1   |  0.5  |
__________________
|  2   |  2.0  |
__________________
|  3   |  4.5  |
__________________
dtype: <class 'float'>
'current_price[alltrue]'
__________________
|  0   |  0.0  |
__________________
|  2   |  2.0  |
__________________
dtype: <class 'float'>
"alltrue[alltrue == <class 'bool'>]"
__________________
|  0   |  True  |
__________________
|  2   |  True  |
__________________
dtype: <class 'bool'>


Similar to pandas, we can create queries based on conditions

In [15]:
#Useful Attributes/ Methods
print(len(price))
print(price.index,price.name,price.dtype)
price.set_name("Old price")
print(price)

4
[0, 1, 2, 3] Old price <class 'float'>
'Old price'
__________________
|  0   |  0.0  |
__________________
|  1   |  0.5  |
__________________
|  2   |  2.0  |
__________________
|  3   |  4.5  |
__________________
dtype: <class 'float'>
'alltrue | alltrue'
__________________
|  0   |  None  |
__________________
|  1   |  False  |
__________________
|  2   |  None  |
__________________
|  3   |  False  |
__________________
dtype: <class 'bool'>


In [17]:
# This shows how one can alter the behaviour of None interaction
# For details, it is worth checking the implementation of operations
t1 = QSeries(data= [True  if i%2==0 else False for i in range(len(idx))], index=idx,dtype=bool,name="alltrue")
t2 = QSeries(data= [None  if i%2==0 else False for i in range(len(idx))], index=idx,dtype=bool,name="alltrue")
print(t1|t2)
QSeries.allow_none_equality = True
print(t1|t2)

'alltrue | alltrue'
__________________
|  0   |  None  |
__________________
|  1   |  False  |
__________________
|  2   |  None  |
__________________
|  3   |  False  |
__________________
dtype: <class 'bool'>
'alltrue | alltrue'
__________________
|  0   |  True  |
__________________
|  1   |  False  |
__________________
|  2   |  True  |
__________________
|  3   |  False  |
__________________
dtype: <class 'bool'>


In [18]:
#Dtype. Running any operation between Series of identical dtype is more or less similar to pandas(though it integrates None entries in that dtype). However, any operation with different dtypes will cause errors. Warnings are generally raised when certain arguments are missing and the indexing/column naming was inferred from data

price[price>1.0] # this works fine
# price[price>1] # this will raise an error. Uncomment to check.

"Old price[Old price > <class 'float'>]"
__________________
|  2   |  2.0  |
__________________
|  3   |  4.5  |
__________________
dtype: <class 'float'>


Accessing elements. This is similar to df.iloc[] in pandas. We enter the index COUNT of the entry(not the index value itself). This was my interpretation of the following requirement:

**"overriding the square bracket access operator, which should when given an integer return the individual value at that position"**

This avoids confusions when we filter columns and we might be interested on first n entries, say. More, upcoming methods like df.describe() might not have int indexing, therefore it is safest to use the order indexing(starting at 0).

## QFrame

Let's create the columns of the following table.

![Screenshot](table.png)


In [22]:
idx = list(range(4))
s = [
    QSeries(data=["X4E", "T3B", "F8D", "C7X"], index=idx, dtype=str, name="SKU"),
    QSeries(data=[7.0, 3.5, 8.0, 6.0], index=idx, dtype=float, name="price"),
    QSeries(data=[5, 3, 1, 10], index=idx, dtype=int, name="sales"),
    QSeries(data=[False, False, True, False], index=idx, dtype=bool, name="taxed"),
]
dict_s = {x.name: x for x in s}
print(dict_s)

{'SKU': 'SKU'
__________________
|  0   |  X4E  |
__________________
|  1   |  T3B  |
__________________
|  2   |  F8D  |
__________________
|  3   |  C7X  |
__________________
dtype: <class 'str'>, 'price': 'price'
__________________
|  0   |  7.0  |
__________________
|  1   |  3.5  |
__________________
|  2   |  8.0  |
__________________
|  3   |  6.0  |
__________________
dtype: <class 'float'>, 'sales': 'sales'
__________________
|  0   |  5  |
__________________
|  1   |  3  |
__________________
|  2   |  1  |
__________________
|  3   |  10  |
__________________
dtype: <class 'int'>, 'taxed': 'taxed'
__________________
|  0   |  False  |
__________________
|  1   |  False  |
__________________
|  2   |  True  |
__________________
|  3   |  False  |
__________________
dtype: <class 'bool'>}


Entering just a dictionary defaults on:
    - checking if an index object is provided
    - if None is found, it checks if serieses have a common index
    - if None works, it will use the general range index
    - the column name is taken from the series name

Note that warnings are present to make user know of potential overlooking.

In [24]:
df = QFrame(data=dict_s)
print(df)


 __________________________________
| Idx | SKU | price | sales | taxed |
 __________________________________
|     | <class 'str'> | <class 'float'> | <class 'int'> | <class 'bool'> |
 __________________________________
|  0  |   X4E   |   7.0   |   5   |   False   |
 __________________________________
|  1  |   T3B   |   3.5   |   3   |   False   |
 __________________________________
|  2  |   F8D   |   8.0   |   1   |   True   |
 __________________________________
|  3  |   C7X   |   6.0   |   10   |   False   |
 __________________________________



In [28]:
df_clean= QFrame(
    data=dict_s,
    index=idx,
    columns=list(dict_s.keys())
)

assert df.index==df_clean.index
assert df.columns==df_clean.columns
assert df.data==df_clean.data

Requested test query

In [30]:
query_ser= df[
    (df["price"] + 5.0 > 10.0) & (df["sales"] > 3) & ~df["taxed"]
]["SKU"]
print(query_ser)

"SKU[price + <class 'float'> > <class 'float'> & sales > <class 'int'> & not taxed]"
__________________
|  0   |  X4E  |
__________________
|  3   |  C7X  |
__________________
dtype: <class 'str'>
