## Selecting Columns

The select() method in PySpark is used to project a set of columns from a DataFrame, much like the SELECT clause in SQL.


### Links and Resources
- [Select Function](https://spark.apache.org/docs/3.5.1/api/python/reference/pyspark.sql/api/pyspark.sql.DataFrame.select.html#pyspark.sql.DataFrame.select)

In [0]:
# Reading the countries_delta data into a DataFrame
df = spark.read.format("delta").load("/FileStore/write_demo/countries_delta")

df.show(5)

+------------+---------+----------+--------+
|country_name|continent|population|area_km2|
+------------+---------+----------+--------+
| Afghanistan|     Asia|  38928346|  652230|
|     Albania|   Europe|   2877797|   28748|
|     Algeria|   Africa|  43851044| 2381741|
|     Andorra|   Europe|     77265|     468|
|      Angola|   Africa|  32866272| 1246700|
+------------+---------+----------+--------+
only showing top 5 rows



### Using select() with String Column Names

In [0]:
df.select("country_name", "population").display()

country_name,population
Sweden,10099265
Switzerland,8654622
Syria,17500657
Tajikistan,9537645
Tanzania,59734218
Thailand,69799978
Timor-Leste,1318445
Togo,8278724
Tonga,105695
Trinidad and Tobago,1399488


In [0]:
df.select("*").display()

country_name,continent,population,area_km2
Sweden,Europe,10099265,450295
Switzerland,Europe,8654622,41285
Syria,Asia,17500657,185180
Tajikistan,Asia,9537645,143100
Tanzania,Africa,59734218,945087
Thailand,Asia,69799978,513120
Timor-Leste,Asia,1318445,14874
Togo,Africa,8278724,56785
Tonga,Oceania,105695,747
Trinidad and Tobago,North America,1399488,5130


### Using select() with Column Objects

In [0]:
from pyspark.sql.functions import col

df.select(
            col("country_name").alias("country"), 
            col("population")
         )\ # use the backslash to indicate a chained method is spread across multiple lines
         .display()

country,population
Sweden,10099265
Switzerland,8654622
Syria,17500657
Tajikistan,9537645
Tanzania,59734218
Thailand,69799978
Timor-Leste,1318445
Togo,8278724
Tonga,105695
Trinidad and Tobago,1399488


### Using Attribute Access

In [0]:
df.select(df.country_name.alias("country"), df.population).display()

country,population
Sweden,10099265
Switzerland,8654622
Syria,17500657
Tajikistan,9537645
Tanzania,59734218
Thailand,69799978
Timor-Leste,1318445
Togo,8278724
Tonga,105695
Trinidad and Tobago,1399488


### Using Bracket Notation

In [0]:
df.select(df['country_name'].alias("country"), df['population']).display()

country,population
Sweden,10099265
Switzerland,8654622
Syria,17500657
Tajikistan,9537645
Tanzania,59734218
Thailand,69799978
Timor-Leste,1318445
Togo,8278724
Tonga,105695
Trinidad and Tobago,1399488
