Unable to load sasdata set into Spark #48

pathri-pk · 2019-04-16T14:20:41Z

Using below code to load a sample sasdata set into spark and getting a timeout error, tried increasing the timeout using option 'metadataTimeout' but still timing out reading metadata, any help is appreciated

import os
os.environ['PYSPARK_SUBMIT_ARGS'] = '--jars /Spark/spark-2.4.1-bin-hadoop2.7/jars/spark-sas7bdat-2.1.0-s_2.11.jar pyspark-shell'
from pyspark.sql import SparkSession

spark = SparkSession.builder.master("local").appName("Spark App").getOrCreate()
df = spark.read.format("com.github.saurfang.sas.spark").load("airline.sas7bdat", forceLowercaseNames=True, inferLong=True)

Error:
Py4JJavaError: An error occurred while calling o226.load.
: java.util.concurrent.TimeoutException: Timed out after 60 sec while reading file metadata, file might be corrupt. (Change timeout with 'metadataTimeout' paramater)

thesuperzapper · 2019-05-08T04:27:43Z

@pathri-pk are you able to post that sas file? (You should be able to drag it into a reply to this issue, if its not huge)

jene4ekjene4ek · 2020-03-27T14:28:35Z

Hi all. I have the same problem with this file. The file is not corrupted, because I read it with pandas.
airline.zip

mazenora · 2020-04-17T10:21:23Z

I had the same problem, so i read the file with pandas then convert it to Spark DataFrame.

df = pd.read_sas('YOUR_FILE',format='sas7bdat')
mySchema = StructType([ ]) <-- create suitable schema structure for spark 
spark_df   = spark.createDataFrame(df,schema=mySchema)

jene4ekjene4ek · 2020-04-17T11:16:10Z

I've solved this issue by using jars parso and spark-sas7bdat

mazenora · 2020-04-17T14:38:08Z

could you please share the code?

jene4ekjene4ek · 2020-04-20T18:26:45Z

In my project folder I have folder jars with parso-2.0.10.jar and spark-sas7bdat-2.1.0-s_2.11.jar (I added printscreen) and run my script as spark-submit --jars jars/spark-sas7bdat-2.1.0-s_2.11.jar,jars/parso-2.0.10.jar script.py

spark = SparkSession.builder.config("spark.jars.packages", "saurfang:spark-sas7bdat:2.1.0-s_2.11") .getOrCreate()

df = spark.read.format('com.github.saurfang.sas.spark')
.load("airline.sas7bdat")

saurfang closed this as completed Sep 13, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Unable to load sasdata set into Spark #48

Unable to load sasdata set into Spark #48

pathri-pk commented Apr 16, 2019

thesuperzapper commented May 8, 2019

jene4ekjene4ek commented Mar 27, 2020 •

edited

Loading

mazenora commented Apr 17, 2020 •

edited

Loading

jene4ekjene4ek commented Apr 17, 2020

mazenora commented Apr 17, 2020

jene4ekjene4ek commented Apr 20, 2020

Unable to load sasdata set into Spark #48

Unable to load sasdata set into Spark #48

Comments

pathri-pk commented Apr 16, 2019

thesuperzapper commented May 8, 2019

jene4ekjene4ek commented Mar 27, 2020 • edited Loading

mazenora commented Apr 17, 2020 • edited Loading

jene4ekjene4ek commented Apr 17, 2020

mazenora commented Apr 17, 2020

jene4ekjene4ek commented Apr 20, 2020

jene4ekjene4ek commented Mar 27, 2020 •

edited

Loading

mazenora commented Apr 17, 2020 •

edited

Loading