# 595. Big Countries
### Table: World

| Column Name | Type    |
|-------------|---------|
| name        | varchar |
| continent   | varchar |
| area        | int     |
| population  | int     |
| gdp         | bigint  |

- `name` is the primary key (column with unique values) for this table.
- Each row of this table gives information about the name of a country, the continent to which it belongs, its area, the population, and its GDP value.

---

### A country is considered **big** if:
- It has an area of at least **three million** (i.e., `3000000` km²), **OR**
- It has a population of at least **twenty-five million** (i.e., `25000000`)

Write a solution to find the `name`, `population`, and `area` of the big countries.

Return the result table in **any order**.

---

### Example 1:

**Input:**

| name        | continent | area    | population | gdp          |
|-------------|-----------|---------|------------|--------------|
| Afghanistan | Asia      | 652230  | 25500100   | 20343000000  |
| Albania     | Europe    | 28748   | 2831741    | 12960000000  |
| Algeria     | Africa    | 2381741 | 37100000   | 188681000000 |
| Andorra     | Europe    | 468     | 78115      | 3712000000   |
| Angola      | Africa    | 1246700 | 20609294   | 100990000000 |

**Output:**

| name        | population | area    |
|-------------|------------|---------|
| Afghanistan | 25500100   | 652230  |
| Algeria     | 37100000   | 2381741 |


In [0]:
# Databricks Notebook: PySpark Solution for "Big Countries" from World Table

from pyspark.sql import SparkSession
from pyspark.sql.types import StructType, StructField, StringType, IntegerType, LongType
from pyspark.sql.functions import col

# Initialize Spark session (Databricks auto-handles this)
spark = SparkSession.builder.getOrCreate()

# Define schema for World table
schema = StructType([
    StructField("name", StringType(), True),
    StructField("continent", StringType(), True),
    StructField("area", IntegerType(), True),
    StructField("population", IntegerType(), True),
    StructField("gdp", LongType(), True)
])

# Sample data
data = [
    ("Afghanistan", "Asia", 652230, 25500100, 20343000000),
    ("Albania", "Europe", 28748, 2831741, 12960000000),
    ("Algeria", "Africa", 2381741, 37100000, 188681000000),
    ("Andorra", "Europe", 468, 78115, 3712000000),
    ("Angola", "Africa", 1246700, 20609294, 100990000000)
]

# Create DataFrame
world_df = spark.createDataFrame(data, schema)


In [0]:
world_df.filter((col("area")>=3000000)|(col("population")>=25000000))\
    .select("name","population","area").display()

In [0]:

# Filter big countries: area >= 3000000 OR population >= 25000000
big_countries_df = world_df.filter(
    (col("area") >= 3000000) | (col("population") >= 25000000)
).select("name", "population", "area")

# Display result
display(big_countries_df)
