# DataGrid - Getting Started

The easiest way to create a DataGrid is to import the class `DataGrid` from the `datagrid` package and create one:

In [1]:
from datagrid import DataGrid

In [2]:
dg = DataGrid()

That's it! Now you can beging to add rows to the DataGrid, either one at a time with `DataGrid.append()` or multiple rows with `DataGrid.extend()`:

In [3]:
dg.append([None, 2.0, "hello, world", True, "2023/12/01"])

Here we append a row with five values: a null value, a float, a string, a boolean, and a string in the YYYY/MM/DD date format. 

We can ask to see the first few rows of the DataGrid with the `DataGrid.head()`:

In [4]:
dg.head()

         row-id              A              B              C              D              E
              1           None            2.0   hello, world           True  2023-12-01 00


Note that an additional column, `row-id` was added automatically. This is column is always added and contains the row number, starting at 1.

In addition, you can see what DataGrid has inferred about the data so far:

In [5]:
dg.info()

DataGrid (in memory)
    Name   : Untitled
    Rows   : 1
    Columns: 5
#   Column                Non-Null Count DataGrid Type       
--- -------------------- --------------- --------------------
1   A                                  0 None                
2   B                                  1 FLOAT               
3   C                                  1 TEXT                
4   D                                  1 BOOLEAN             
5   E                                  1 DATETIME            


Some things to take note of so far:

First, you may recognize that some of these methods replicate those in pandas DataFrame. These have similar functionality as to those in a DataFrame. However, most of the things you can do in a DataFrame are not supported in a DataGrid. We'll see examples of uses of DataGrid below.

Also note that `dg.info()` shows that this is an "in memory" DataGrid. That is, it hasn't been saved to disk yet. At this stage in the DataGrid construction, it is attempting to infer the types of each column of data. As we appended a None in column A, we see that the `DataGrid Type` is None for column A.

Let's append another row, and check out the info:

In [6]:
dg.append([5, 6, "another string", False, "2023/12/02"])

In [7]:
dg.info()

DataGrid (in memory)
    Name   : Untitled
    Rows   : 2
    Columns: 5
#   Column                Non-Null Count DataGrid Type       
--- -------------------- --------------- --------------------
1   A                                  1 INTEGER             
2   B                                  2 FLOAT               
3   C                                  2 TEXT                
4   D                                  2 BOOLEAN             
5   E                                  2 DATETIME            


We see that the DataGrid is still in memory, but the DataGrid Type of column A is now INTEGER. 

Let's add another row:

In [8]:
dg.append([4.0, 6, "another string", False, "2023/12/02"])

In [9]:
dg.info()

DataGrid (in memory)
    Name   : Untitled
    Rows   : 3
    Columns: 5
#   Column                Non-Null Count DataGrid Type       
--- -------------------- --------------- --------------------
1   A                                  2 FLOAT               
2   B                                  3 FLOAT               
3   C                                  3 TEXT                
4   D                                  3 BOOLEAN             
5   E                                  3 DATETIME            


After appending a float to column A, the Type has become FLOAT. DataGrid attempts to identify specific types of the data, but allows the types to become more general with additional rows.

Now, let's save the DataGrid to disk:

In [10]:
dg.save()

100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 3/3 [00:00<00:00, 32430.19it/s]

Saving datagrid to 'untitled.datagrid'...





At this point, the DataGrid is now operating in a different mode. All data is now stored on disk, and Types are no longer allowed to change.

Let's add a new row, using the value 87 for the boolean column D:

In [11]:
dg.append([5.0, 6, "another string", 87, "2023/12/02"])

In [12]:
dg.info()

DataGrid (on disk)
    Name   : Untitled
    Rows   : 4
    Columns: 5
#   Column                Non-Null Count DataGrid Type       
--- -------------------- --------------- --------------------
1   A                                  3 FLOAT               
2   B                                  4 FLOAT               
3   C                                  4 TEXT                
4   D                                  4 BOOLEAN             
5   E                                  4 DATETIME            


Now the DataGrid Type of column D cannot change, and remains a "BOOLEAN". If we had appended this row before saving, the DataGrid type would have become "TEXT", the highest encompasing type.

What does DataGrid do with an 87 in a boolean column?

In [13]:
dg.head()

         row-id              A              B              C              D              E
              1           None            2.0   hello, world           True  2023-12-01 00
              2            5.0            6.0  another strin          False  2023-12-02 00
              3            4.0            6.0  another strin          False  2023-12-02 00
              4            5.0            6.0  another strin           True  2023-12-02 00


It is able to convert it to a boolean, True.

But that doesn't mean it can convert any value to any type. Let's try to append the value "Nope!" to a float column:

In [14]:
dg.append(["Nope!", 6, "another string", 87, "2023/12/02"])

Exception: Invalid type for column 'A': value was 'Nope!', but should have been type 'FLOAT'

And indeed, nope that doesn't work.

## Specifying Columns

If you would like, you can name the columns when you first create a DataGrid:

In [1]:
from datagrid import DataGrid 

In [2]:
dg = DataGrid(name="Example 1", columns=["Category", "Loss", "Fitness", "Timestamp"])

## Making Large DataGrids

If you can create your DataGrid in memory, that is much faster. However, if you need to create larger datagrids, you can save the DataGrid first, and then append or extend after it is on disk.

In [3]:
import random
import datetime

In [4]:
for i in range(10000):
    dg.append([
        random.choice(["dog", "cat", "mouse", "duck"]), 
        random.random() - 2.0, 
        random.random() * 10, 
        datetime.datetime.now()
    ])

In [5]:
dg.save()

100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 10000/10000 [00:00<00:00, 245579.65it/s]


Saving datagrid to 'example-1.datagrid'...


In [7]:
dg.show()