# DataFrame

Two-dimensional data structure representing data as a table with rows and columns.

### Parameters

1. data (dict, Sequence, ndarray, Series, or pandas.DataFrame)
   Two-dimensional data in various forms; dict input must contain Sequences,
   Generators, or a `range`. Sequence may contain Series or other Sequences.

2. schema (Sequence of str, (str,DataType) pairs, or a {str:DataType,} dict)
   The schema of the resulting DataFrame. The schema may be declared in several
   ways:

- As a dict of {name:type} pairs; if type is None, it will be auto-inferred.
- As a list of column names; in this case types are automatically inferred.
- As a list of (name,type) pairs; this is equivalent to the dictionary form.

If you supply a list of column names that does not match the names in the
underlying data, the names given here will overwrite them. The number
of names given in the schema should match the underlying data dimensions.

If set to `None` (default), the schema is inferred from the data.

3. schema_overrides (dict, default None)
   Support type specification or override of one or more columns; note that
   any dtypes inferred from the schema param will be overridden.

The number of entries in the schema should match the underlying data
dimensions, unless a sequence of dictionaries is being passed, in which case
a _partial_ schema can be declared to prevent specific fields from being loaded.

4. strict (bool, default True)
   Throw an error if any `data` value does not exactly match the given or inferred
   data type for that column. If set to `False`, values that do not match the data
   type are cast to that data type or, if casting is not possible, set to null
   instead.

5. orient ({'col', 'row'}, default None)
   Whether to interpret two-dimensional data as columns or as rows. If None,
   the orientation is inferred by matching the columns and data dimensions. If
   this does not yield conclusive results, column orientation is used.

6. infer*schema_length (int or None)
   The maximum number of rows to scan for schema inference. If set to `None`, the
   full data may be scanned *(this can be slow)\_. This parameter only applies if
   the input data is a sequence or generator of rows; other input is read as-is.

7. nan_to_null (bool, default False)
   If the data comes from one or more numpy arrays, can optionally convert input
   data np.nan values to null instead. This is a no-op for all other input data.


In [291]:
import polars as pl

### Create DataFrames


In [292]:
df = pl.DataFrame({'first': [1, 2, 3], 'second': [4, 5, 6]})
df = pl.DataFrame([[1, 2, 3], [4, 5, 6]], schema=['first', 'second'])
df = pl.DataFrame(
    [pl.Series([1, 2, 3]), pl.Series([4, 5, 6])],
    schema=['first', 'second']
)

df

first,second
i64,i64
1,4
2,5
3,6


#### Set DataFrame's data types statically


In [293]:
df = pl.DataFrame(
    [[1, 2, 3], [4, 5, 6], [7, 8, 9]],
    # int will be converted to pl.Int64
    schema=[('first', pl.Int16), ('second', pl.Int32), ('third', int)]
)
df = pl.DataFrame(
    ([1, 2, 3], [4, 5, 6], [7, 8, 9]),
    # int will be converted to pl.Int64
    schema={'first': pl.Int16, 'second': pl.Int32, 'third': int}
)

df

first,second,third
i16,i32,i64
1,4,7
2,5,8
3,6,9


### Get DataFrame's column(s)


In [294]:
# 1. Select by calling __getitem__ method
# Select by column name(s)
df['first', 'third']
df[['first', 'third']]
df[('first', 'third')]

# Parameters: [row(s), column(s)]
df[:, ('first', 'third')]
df[:, (0, 2)]
df[:, -2:]

# Select by column name(s)
df.select('first', 'third')
df.select(['first', 'third'])
df.select(('first', 'third'))

# Select by expression(s)
df.select(pl.col('first', 'third'))
df.select(pl.col('first'), pl.col('third'))
df.select([pl.col('first'), pl.col('third')])
df.select((pl.col('first'), pl.col('third')))

# With expression(s), it will allow us to evaluate each value
df.select(pl.col('first') + 10)

first
i16
11
12
13


### Get DataFrame's row(s)


In [313]:
df[0]
df[:2]
df[1:-1]

# Get row(s) and speficy the column(s)
df[0, -2:]
df[0, (0, 2)]

df[0, ('third', 'first')]
df[0, ('second',)]
df[(0, 2), :]

first,second,third
i16,i32,i64
1,4,7
3,6,9
