#### **array_contains**

- used to check if **array column contains a specific value**.
- It is commonly used in **filtering** operations.
- It returns a **Boolean column** indicating the **presence** of the element in the array.
  - **True**: If the value is **present**.
  - **False**: If the value is **not present**.
  - **null**: If the array column is **null/None**.

#### **Syntax**

     array_contains(array_column, value)
     
**column (str, Column):** It represents a column of ArrayType

**value (str):** It represents the value to check if it is in the array column

**Returns**: BOOLEAN

In [0]:
%sql
SELECT array_contains(array(1, 2, 3), 2) AS Boolean;

Boolean
True


In [0]:
%sql
SELECT array_contains(array(1, NULL, 3), 2) AS Boolean;

Boolean
""


In [0]:
%sql
SELECT array_contains(array(1, 4, 3), 2) AS Boolean;

Boolean
False


In [0]:
from pyspark.sql.types import StructType, StructField, ArrayType, StringType
from pyspark.sql.functions import array_contains

In [0]:
data = [("Anand", ["Java","Scala","C++"], ["Spark","Java","Azure Databricks"], [8, 9, 5, 7]),
        ("Berne", ["Python","PySpark","C"], ["spark sql","ADF","SQL"], [11, 3, 6, 8]),
        ("Charan", ["Devops","VB","Git"], ["ApacheSpark","Python"], [5, 6, 8, 10]),
        ("Denish", ["SQL","Azure","AWS"], ["PySpark","Oracle","Confluence"], [12, 6, 8, 15]),
        ("Krishna", ["GCC","Visual Studio","Python"], ["SQL","Databricks","SQL Editor"], [2, 6, 5, 8]),
        ("Hari", ["Devops","VB","Git"], ["ApacheSpark","Python"], [5, 6, 8, 10]),
        ("Rakesh", ["SQL","Azure","AWS"], ["PySpark","Oracle","SQL"], [12, 6, 8, 15]),
        ("karan", ["AWS","Visual Studio","Python"], ["SQL","Git","SQL Editor"], [2, 6, 5, 8]),
        ("Eren", None, None, None)]
 
columns = ["Full_Name", "Languages", "New_Languages", "Experience"]
df = spark.createDataFrame(data, schema=columns)
df.printSchema()
display(df)

root
 |-- Full_Name: string (nullable = true)
 |-- Languages: array (nullable = true)
 |    |-- element: string (containsNull = true)
 |-- New_Languages: array (nullable = true)
 |    |-- element: string (containsNull = true)
 |-- Experience: array (nullable = true)
 |    |-- element: long (containsNull = true)



Full_Name,Languages,New_Languages,Experience
Anand,"List(Java, Scala, C++)","List(Spark, Java, Azure Databricks)","List(8, 9, 5, 7)"
Berne,"List(Python, PySpark, C)","List(spark sql, ADF, SQL)","List(11, 3, 6, 8)"
Charan,"List(Devops, VB, Git)","List(ApacheSpark, Python)","List(5, 6, 8, 10)"
Denish,"List(SQL, Azure, AWS)","List(PySpark, Oracle, Confluence)","List(12, 6, 8, 15)"
Krishna,"List(GCC, Visual Studio, Python)","List(SQL, Databricks, SQL Editor)","List(2, 6, 5, 8)"
Hari,"List(Devops, VB, Git)","List(ApacheSpark, Python)","List(5, 6, 8, 10)"
Rakesh,"List(SQL, Azure, AWS)","List(PySpark, Oracle, SQL)","List(12, 6, 8, 15)"
karan,"List(AWS, Visual Studio, Python)","List(SQL, Git, SQL Editor)","List(2, 6, 5, 8)"
Eren,,,


#### **1) How to check value is present in a column?**
- Function **checks** if the specified **value is present** in an **array column or not**.

In [0]:
# to find out whether the students know Python or not.
df_con_py = df.select("Full_Name", "Languages", array_contains("Languages", "Python").alias("knowns_python"))
display(df_con_py)

Full_Name,Languages,knowns_python
Anand,"List(Java, Scala, C++)",False
Berne,"List(Python, PySpark, C)",True
Charan,"List(Devops, VB, Git)",False
Denish,"List(SQL, Azure, AWS)",False
Krishna,"List(GCC, Visual Studio, Python)",True
Hari,"List(Devops, VB, Git)",False
Rakesh,"List(SQL, Azure, AWS)",False
karan,"List(AWS, Visual Studio, Python)",True
Eren,,


In [0]:
#  to find out whether the students know Java or not.
df_con_ja = df.withColumn("knowns_java", array_contains("New_Languages", "SQL"))\
              .select("Full_Name", "New_Languages", "knowns_java")
display(df_con_ja)

Full_Name,New_Languages,knowns_java
Anand,"List(Spark, Java, Azure Databricks)",False
Berne,"List(spark sql, ADF, SQL)",True
Charan,"List(ApacheSpark, Python)",False
Denish,"List(PySpark, Oracle, Confluence)",False
Krishna,"List(SQL, Databricks, SQL Editor)",True
Hari,"List(ApacheSpark, Python)",False
Rakesh,"List(PySpark, Oracle, SQL)",True
karan,"List(SQL, Git, SQL Editor)",True
Eren,,


In [0]:
df_arr_con = df.select("Full_Name",\
                       "Languages", array_contains(df.Languages, "SQL").alias("Knows_Python"),\
                       "New_Languages", array_contains(df.New_Languages, "PySpark").alias("Knows_PySpark"),\
                       "Experience", array_contains(df.Experience, 8).alias("Experience"))
display(df_arr_con)

Full_Name,Languages,Knows_Python,New_Languages,Knows_PySpark,Experience,Experience.1
Anand,"List(Java, Scala, C++)",False,"List(Spark, Java, Azure Databricks)",False,"List(8, 9, 5, 7)",True
Berne,"List(Python, PySpark, C)",False,"List(spark sql, ADF, SQL)",False,"List(11, 3, 6, 8)",True
Charan,"List(Devops, VB, Git)",False,"List(ApacheSpark, Python)",False,"List(5, 6, 8, 10)",True
Denish,"List(SQL, Azure, AWS)",True,"List(PySpark, Oracle, Confluence)",True,"List(12, 6, 8, 15)",True
Krishna,"List(GCC, Visual Studio, Python)",False,"List(SQL, Databricks, SQL Editor)",False,"List(2, 6, 5, 8)",True
Hari,"List(Devops, VB, Git)",False,"List(ApacheSpark, Python)",False,"List(5, 6, 8, 10)",True
Rakesh,"List(SQL, Azure, AWS)",True,"List(PySpark, Oracle, SQL)",True,"List(12, 6, 8, 15)",True
karan,"List(AWS, Visual Studio, Python)",False,"List(SQL, Git, SQL Editor)",False,"List(2, 6, 5, 8)",True
Eren,,,,,,


#### **2) How to filter records using array_contains()?**

- To **filter** out students **who know “Python”** using array_contains() as a condition.

In [0]:
df_con_py_filt = df.select("Full_Name", "Languages") \
                   .filter(array_contains("Languages", "Python"))
display(df_con_py_filt)

Full_Name,Languages
Berne,"List(Python, PySpark, C)"
Krishna,"List(GCC, Visual Studio, Python)"
karan,"List(AWS, Visual Studio, Python)"


In [0]:
df_con_py_filt1 = df.select("Full_Name", "New_Languages") \
                    .filter(array_contains("New_Languages", "SQL"))
display(df_con_py_filt1)+

Full_Name,New_Languages
Berne,"List(spark sql, ADF, SQL)"
Krishna,"List(SQL, Databricks, SQL Editor)"
Rakesh,"List(PySpark, Oracle, SQL)"
karan,"List(SQL, Git, SQL Editor)"
