## How to create dataframe in PySpark
In PySpark, you can create a DataFrame using various methods. 

### 1. From a List of Tuples (Manually Created Data)

In [0]:
from pyspark.sql.types import StructType, StructField, IntegerType, StringType

# Sample Data
my_data = [(1, "Rohish"), (2, "Priya"), (3, "Smit")]

# Define Schema
# my_schema = StructType(
#     [
#         StructField("id", IntegerType(), True),
#         StructField("name", StringType(), True)
#     ]
# )

# also we can give schema in ddl or in list
schema1 = "id integer, name string"
schema2 = ["id", "name"]

# Create DataFrame
my_df1 = spark.createDataFrame(data=my_data, schema=schema2)
my_df1.show()

+---+------+
| id|  name|
+---+------+
|  1|Rohish|
|  2| Priya|
|  3|  Smit|
+---+------+



### 2. From a Python Dictionary

In [0]:
data = [{"ID": 1, "Name": "Rohish", "Age": 27},
        {"ID": 2, "Name": "Priya", "Age": 26},
        {"ID": 3, "Name": "Smit", "Age": 25}]

df = spark.createDataFrame(data)
df.show()

+---+---+------+
|Age| ID|  Name|
+---+---+------+
| 27|  1|Rohish|
| 26|  2| Priya|
| 25|  3|  Smit|
+---+---+------+



### 3. Using Pandas DataFrame
If you have a Pandas DataFrame, you can convert it to a PySpark DataFrame:

In [0]:
import pandas as pd

# Create Pandas DataFrame
pandas_df = pd.DataFrame({"ID": [1, 2, 3], "Name": ["Alice", "Bob", "Cathy"], "Age": [25, 30, 28]})

# Convert to PySpark DataFrame
pd_df = spark.createDataFrame(pandas_df)
pd_df.show()




+---+-----+---+
| ID| Name|Age|
+---+-----+---+
|  1|Alice| 25|
|  2|  Bob| 30|
|  3|Cathy| 28|
+---+-----+---+



**Convert Spark to Pandas DataFrame**

To convert a PySpark DataFrame to a Pandas DataFrame, you can use the **toPandas()** method.

In [0]:
# Convert PySpark DataFrame to Pandas DataFrame
pandas_df = pd_df.toPandas()

# Display the Pandas DataFrame
print(pandas_df)

   ID   Name  Age
0   1  Alice   25
1   2    Bob   30
2   3  Cathy   28


###  From a CSV File
```python
df = spark.read.csv("path/to/file.csv", header=True, inferSchema=True)
```


### From an RDD
```python
rdd = spark.sparkContext.parallelize([(1, "Alice", 25), (2, "Bob", 30), (3, "Cathy", 28)])
df = rdd.toDF(["ID", "Name", "Age"])
df.show()
```