# Pandas data types

[pandas arrays, scalars, and data types](https://pandas.pydata.org/docs/reference/arrays.html) list the data types.

> For most data types, pandas uses NumPy arrays as the concrete objects contained with a Index, Series, or DataFrame.
> For some data types, pandas extends NumPy’s type system. String aliases for these types can be found at dtypes.

The child pages list each dtypes.

* pandas.Int8Dtype
* pandas.Int16Dtype
* pandas.Int32Dtype
* pandas.Int64Dtype
* pandas.UInt8Dtype
* pandas.UInt16Dtype
* pandas.UInt32Dtype
* pandas.UInt64Dtype
* pandas.CategoricalDtype
* pandas.StringDtype
* pandas.BooleanDtype
* pandas.DatetimeTZDtype

In [1]:
import numpy as np
import pandas as pd

In [17]:
df = pd.read_json(
    "../data/recovery.json",
    dtype={
        "facility": pd.CategoricalDtype,
        "supplier": pd.CategoricalDtype,
        "supplierCode": pd.CategoricalDtype,
        "suppliedM3": np.float32,
        "recoveredM3": np.float32,
    },
    convert_dates=['date']
)
df.insert(3,'elapsed', pd.to_timedelta('00:' + df['processTime'], errors='coerce'))
df

Unnamed: 0,facility,timeStart,processTime,elapsed,supplier,suppliedM3,recoveredM3,date,timeEnd,supplierCode
0,Bundaberg,9/1/22 8:16 AM,4:05,0 days 00:04:05,Mary,5.09,4.13,NaT,,
1,Newcastle,8:29:00 AM,,NaT,,2.00,1.55,2022-09-01,9:07:00 AM,har
2,Newcastle,9:27:00 AM,,NaT,,6.80,4.15,2022-09-01,11:28:00 AM,dic
3,Newcastle,11:38:00 AM,,NaT,,1.95,1.55,2022-09-01,12:21:00 PM,har
4,Bundaberg,9/1/22 12:34 PM,1:50,0 days 00:01:50,Mary Therese,3.78,2.56,NaT,,
...,...,...,...,...,...,...,...,...,...,...
227,Newcastle,11:40:00 AM,,NaT,,3.70,2.35,2022-09-30,12:41:00 PM,tom
228,Newcastle,12:52:00 PM,,NaT,,6.35,4.55,2022-09-30,2:36:00 PM,dic
229,Bundaberg,9/30/22 1:48 PM,3:40,0 days 00:03:40,Mary Therese,4.53,2.73,NaT,,
230,Newcastle,3:02:00 PM,,NaT,,2.00,1.45,2022-09-30,3:42:00 PM,har


In [16]:
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 232 entries, 0 to 231
Data columns (total 10 columns):
 #   Column        Non-Null Count  Dtype          
---  ------        --------------  -----          
 0   facility      232 non-null    object         
 1   timeStart     232 non-null    object         
 2   processTime   111 non-null    object         
 3   supplier      111 non-null    object         
 4   suppliedM3    232 non-null    float32        
 5   recoveredM3   232 non-null    float32        
 6   date          121 non-null    datetime64[ns] 
 7   timeEnd       121 non-null    object         
 8   supplierCode  121 non-null    object         
 9   elapsed       111 non-null    timedelta64[ns]
dtypes: datetime64[ns](1), float32(2), object(6), timedelta64[ns](1)
memory usage: 16.4+ KB


In [4]:
df.describe()

Unnamed: 0,suppliedM3,recoveredM3
count,232.0,232.0
mean,4.141034,2.857543
std,1.369829,0.92911
min,1.9,1.2
25%,3.08,2.1875
50%,4.15,2.865
75%,5.05,3.58
max,6.95,5.5
