Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[DOC] add examples for loading data from common tabular csv formats #4612

Merged
merged 7 commits into from
May 24, 2023
127 changes: 123 additions & 4 deletions examples/AA_datatypes_and_datasets.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -761,7 +761,7 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"## Section 3: loading data sets\n",
"## Section 3: loading pre-defined data sets\n",
"\n",
"`sktime`'s `datasets` module allows to load datasets for testing and benchmarking. This includes:\n",
"\n",
Expand Down Expand Up @@ -1041,6 +1041,116 @@
"source": [
"This will download the dataset into a local directory (by default: for a local clone, the `datasets/data` directory in the local repository; for a release install, in the local python environment folder). To change that directory, specify it using the `extract_path` argument of the `load_UCR_UEA_dataset` function."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Section 4: loading data from files\n",
"\n",
"In older versions of sktime, there were functions of loading data directly from files. This part has been partly deprecated such that you can only load files from .ts files. Hence, we will show here how to do such data loading by youself. \n",
"\n",
"We'll cover:\n",
"\n",
"* converting series datasets to sktime compatible containers\n",
"* converting panel datasets to sktime compatible containers\n",
"\n"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"import os\n",
"from sktime.datasets import load_airline"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"#mimicing a scenario where you already have a csv file\n",
"df_series = load_airline()\n",
"df_series.to_csv('./series_data(airplane).csv')"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"df_series = pd.read_csv('./series_data(airplane).csv') #replace in the parenthesis the actual filename of your csv file\n",
"os.remove('./series_data(airplane).csv') #delete this line\n",
"df_series = df_series.set_index('Period').squeeze() #replace \"Period\" with the column name of the time index\n",
"df_series.index = pd.DatetimeIndex(df_series.index) "
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"'pd.Series'"
]
},
"metadata": {},
"output_type": "display_data"
}
],
"source": [
"mtype(df_series,as_scitype='Series')"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"#mimicing a scenario where you already have a csv file\n",
"df_multi_idx = load_arrow_head()[0]\n",
"df_multi_idx = convert_to(df_multi_idx,'pd-multiindex')\n",
"df_multi_idx.to_csv('./panel_multiindex(arrow).csv')"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"df_panel = pd.read_csv('./panel_multiindex(arrow).csv')\n",
"os.remove('./panel_multiindex(arrow).csv')"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"'pd-multiindex'"
]
},
"metadata": {},
"output_type": "display_data"
}
],
"source": [
"mtype(df_panel.set_index(['timepoints','Unnamed: 0']),as_scitype='Panel') \n",
"#replace \"timepoints\" with the time index column name \n",
"#replace \"Unnamed:0\" with the higher level column name of your file"
]
}
],
"metadata": {
Expand All @@ -1049,12 +1159,21 @@
"hash": "bc250fec99d1b72e5bb23d9fb06e1f1ac90e860438a1535c061277d2caf5ebfc"
},
"kernelspec": {
"display_name": "Python 3.7.10 64-bit ('sktime': conda)",
"name": "python3"
"display_name": "kernel_ts",
"language": "python",
"name": "kernel_ts"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"version": ""
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.10.11"
},
"latex_envs": {
"LaTeX_envs_menu_present": true,
Expand Down