### 1. What is take() in PySpark?

The take(n) function in PySpark is used to retrieve the first n rows from a DataFrame quickly.

It returns a list of Row objects, not a DataFrame.

It is faster than using limit(n).collect() because take() directly fetches the required rows without triggering unnecessary optimizations.

It’s useful when you need a small sample for inspection.

### 2. Syntax
DataFrame.take(num)


Parameters:

num → The number of rows to return.

Returns:

A list of Row objects.

### 3. Sample DataFrame

Let's create a sample PySpark DataFrame:

In [7]:
# Welcome# Sample data
data = [
    (1, "Alice", 29),
    (2, "Bob", 31),
    (3, "Charlie", 25),
    (4, "David", 40),
    (5, "Eva", 35)
]

# Define schema
columns = ["ID", "Name", "Age"]

# Create DataFrame
df = spark.createDataFrame(data, columns)

# display DataFrame
display(df)


StatementMeta(, b1ed3170-10e0-469e-826e-7adce14199f9, 9, Finished, Available, Finished)

SynapseWidget(Synapse.DataFrame, ef484ea4-286e-4981-b88e-cb258e5cb9b5)

In [5]:
#Using take() to Get First N Rows
# Take first 3 rows
first_three = df.take(3)

# Print result
for row in first_three:
    print(row)

StatementMeta(, b1ed3170-10e0-469e-826e-7adce14199f9, 7, Finished, Available, Finished)

Row(ID=1, Name='Alice', Age=29)
Row(ID=2, Name='Bob', Age=31)
Row(ID=3, Name='Charlie', Age=25)


### 4. Convert take() Output Back to a DataFrame

Since take() returns a list of Row objects, if you want to work with it as a DataFrame again:

In [8]:
df_first_three = spark.createDataFrame(df.take(3), df.schema)
display(df_first_three)

StatementMeta(, b1ed3170-10e0-469e-826e-7adce14199f9, 10, Finished, Available, Finished)

SynapseWidget(Synapse.DataFrame, 61087762-f17a-41fd-a6bd-a4f0b992b069)

### 5. Alternative Options in Fabric Notebook
| **Method**    | **Returns**  | **Use Case**                          |
| ------------- | ------------ | ------------------------------------- |
| `df.take(n)`  | List of Rows | **Fastest** way to fetch first N rows |
| `df.head(n)`  | List of Rows | Same as `take()`                      |
| `df.limit(n)` | DataFrame    | Use if you want a DataFrame           |
| `display(df)` | Visual table | For Notebook UI                       |
| `df.show(n)`  | Prints rows  | Quick preview in logs                 |


### 6. Example: Accessing Specific Columns

In [9]:
# Take first 2 rows and access columns
rows = df.take(2)
for r in rows:
    print(f"Name: {r['Name']}, Age: {r['Age']}")

StatementMeta(, b1ed3170-10e0-469e-826e-7adce14199f9, 11, Finished, Available, Finished)

Name: Alice, Age: 29
Name: Bob, Age: 31


### 7. Key Notes for Microsoft Fabric Notebooks
Spark session is already initialized → use spark directly.

Use display(df) instead of df.show() for a better table UI.

For small samples, prefer take() or head().

For downloading data, use toPandas():

In [10]:
df.take(5)
df.limit(5).toPandas()

StatementMeta(, b1ed3170-10e0-469e-826e-7adce14199f9, 12, Finished, Available, Finished)

Unnamed: 0,ID,Name,Age
0,1,Alice,29
1,2,Bob,31
2,3,Charlie,25
3,4,David,40
4,5,Eva,35
