Skip to content

Commit 037bd89

Browse files
committed
Updated Notebook
1 parent 80a57df commit 037bd89

File tree

1 file changed

+5
-1
lines changed

1 file changed

+5
-1
lines changed

SparkContext and RDD Basics.ipynb

Lines changed: 5 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -394,7 +394,11 @@
394394
"cell_type": "markdown",
395395
"metadata": {},
396396
"source": [
397-
"### Get another RDD with only the `distinct` elements"
397+
"### Removing duplicates: Get another RDD with only the `distinct` elements\n",
398+
"\n",
399+
"The method `RDD.distinct()` Returns a new dataset that contains the distinct elements of the source dataset.\n",
400+
"\n",
401+
"**NOTE**: This operation requires a **shuffle** in order to detect duplication across partitions. **So, it is a slow operation.**"
398402
]
399403
},
400404
{

0 commit comments

Comments
 (0)