Fix FordA download url in classification notebook (#309)

* fix ford_a download url * fix url * update changelog --------- Co-authored-by: Egor Baturin <egoriyaa@github.com>
etna-team · May 13, 2024 · 010fa43 · 010fa43
1 parent 97515b9
commit 010fa43
Show file tree

Hide file tree

Showing 2 changed files with 19 additions and 13 deletions.
diff --git a/CHANGELOG.md b/CHANGELOG.md
@@ -42,7 +42,7 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0
 - 
 
 ### Fixed
-- 
+- Fix FordA download url in classification notebook ([#309](https://github.com/etna-team/etna/pull/309))
 - 
 - 
 - 

diff --git a/examples/305-classification.ipynb b/examples/305-classification.ipynb
@@ -56,7 +56,15 @@
    "execution_count": 3,
    "id": "c085ebe2",
    "metadata": {},
-   "outputs": [],
+   "outputs": [
+    {
+     "name": "stderr",
+     "output_type": "stream",
+     "text": [
+      "\u001b[34m\u001b[1mwandb\u001b[0m: \u001b[33mWARNING\u001b[0m Disabling SSL verification.  Connections to this server are not verified and may be insecure!\n"
+     ]
+    }
+   ],
    "source": [
     "import pathlib\n",
     "\n",
@@ -90,9 +98,7 @@
    "source": [
     "### 1.1 Loading dataset <a class=\"anchor\" id=\"section_1_1\"></a>\n",
     "\n",
-    "Consider the example `FordA` dataset from [UCR archive](https://www.cs.ucr.edu/~eamonn/time_series_data/). Dataset consists of engine noise measurements and the problem is to diagnose whether a certain symptom exists in the engine. The comprehensive description of `FordA` dataset can be found [here](http://www.timeseriesclassification.com/description.php?Dataset=FordA). \n",
-    "\n",
-    "It is possible to load the dataset using `fetch_ucr_dataset` function from [`pyts` library](https://pyts.readthedocs.io/en/stable/index.html), but let's do it manually."
+    "Consider the example `FordA` dataset from [UCR archive](https://www.cs.ucr.edu/~eamonn/time_series_data/). Dataset consists of engine noise measurements and the problem is to diagnose whether a certain symptom exists in the engine."
    ]
   },
   {
@@ -107,13 +113,13 @@
      "text": [
       "  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current\n",
       "                                 Dload  Upload   Total   Spent    Left  Speed\n",
-      "100 34.6M  100 34.6M    0     0  2585k      0  0:00:13  0:00:13 --:--:-- 2826k\n"
+      "100  301M  100  301M    0     0  4640k      0  0:01:06  0:01:06 --:--:-- 4085k33k      0  0:01:41  0:00:07  0:01:34 4195k   0  0:01:25  0:00:14  0:01:11 4251k    0  0:01:07  0:00:47  0:00:20 5043k\n"
      ]
     }
    ],
    "source": [
-    "!curl \"https://timeseriesclassification.com/aeon-toolkit/FordA.zip\" -o data/ford_a.zip\n",
-    "!unzip -q data/ford_a.zip -d data/ford_a"
+    "!curl https://www.cs.ucr.edu/~eamonn/time_series_data_2018/UCRArchive_2018.zip -o data/ucr_datasets.zip\n",
+    "!unzip -q -P someone -j data/ucr_datasets.zip 'UCRArchive_2018/FordA/*.tsv' -d data/"
    ]
   },
   {
@@ -123,9 +129,9 @@
    "metadata": {},
    "outputs": [],
    "source": [
-    "def load_ford_a(path: pathlib.Path, dataset_name: str):\n",
-    "    train_path = path / (dataset_name + \"_TRAIN.txt\")\n",
-    "    test_path = path / (dataset_name + \"_TEST.txt\")\n",
+    "def load_ford_a(path: str):\n",
+    "    train_path = path + \"_TRAIN.tsv\"\n",
+    "    test_path = path + \"_TEST.tsv\"\n",
     "    data_train = np.genfromtxt(train_path)\n",
     "    data_test = np.genfromtxt(test_path)\n",
     "\n",
@@ -145,14 +151,14 @@
    "metadata": {},
    "outputs": [],
    "source": [
-    "X_train, X_test, y_train, y_test = load_ford_a(pathlib.Path(\"data\") / \"ford_a\", \"FordA\")\n",
+    "X_train, X_test, y_train, y_test = load_ford_a(\"data/FordA\")\n",
     "y_train[y_train == -1], y_test[y_test == -1] = 0, 0  # transform labels to 0,1"
    ]
   },
   {
    "cell_type": "code",
    "execution_count": 7,
-   "id": "c6f62d48",
+   "id": "fa1581fb",
    "metadata": {},
    "outputs": [
     {