-
Notifications
You must be signed in to change notification settings - Fork 40
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Unable to load sasdata set into Spark #48
Comments
@pathri-pk are you able to post that sas file? (You should be able to drag it into a reply to this issue, if its not huge) |
Hi all. I have the same problem with this file. The file is not corrupted, because I read it with pandas. |
I had the same problem, so i read the file with pandas then convert it to Spark DataFrame.
|
I've solved this issue by using jars parso and spark-sas7bdat |
could you please share the code? |
In my project folder I have folder jars with parso-2.0.10.jar and spark-sas7bdat-2.1.0-s_2.11.jar (I added printscreen) and run my script as spark-submit --jars jars/spark-sas7bdat-2.1.0-s_2.11.jar,jars/parso-2.0.10.jar script.py spark = SparkSession.builder.config("spark.jars.packages", "saurfang:spark-sas7bdat:2.1.0-s_2.11") .getOrCreate() df = spark.read.format('com.github.saurfang.sas.spark') |
Using below code to load a sample sasdata set into spark and getting a timeout error, tried increasing the timeout using option 'metadataTimeout' but still timing out reading metadata, any help is appreciated
import os
os.environ['PYSPARK_SUBMIT_ARGS'] = '--jars /Spark/spark-2.4.1-bin-hadoop2.7/jars/spark-sas7bdat-2.1.0-s_2.11.jar pyspark-shell'
from pyspark.sql import SparkSession
spark = SparkSession.builder.master("local").appName("Spark App").getOrCreate()
df = spark.read.format("com.github.saurfang.sas.spark").load("airline.sas7bdat", forceLowercaseNames=True, inferLong=True)
Error:
Py4JJavaError: An error occurred while calling o226.load.
: java.util.concurrent.TimeoutException: Timed out after 60 sec while reading file metadata, file might be corrupt. (Change timeout with 'metadataTimeout' paramater)
The text was updated successfully, but these errors were encountered: