Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

GC limit overhead exceeded because of temporary objects #29

Closed
extstmtrifork opened this issue Nov 13, 2018 · 4 comments
Closed

GC limit overhead exceeded because of temporary objects #29

extstmtrifork opened this issue Nov 13, 2018 · 4 comments
Assignees
Labels

Comments

@extstmtrifork
Copy link

extstmtrifork commented Nov 13, 2018

Hi,

I am trying to read from a csv file containing a bit more than 2 million rows, then make a simple mapping to something i can use to finally insert it to a database. However, i am getting an erro: "GC limit overhead exceeded", as it creates a lot of temporary objects.

I read the other issue regarding temporary objects, however as i could understand, it is regarding writing to a csv file, but i am getting this error while reading from an csv file.

@dhoard
Copy link

dhoard commented Nov 13, 2018

@extstmtrifork can you provide some sample code?

@extstmtrifork
Copy link
Author

extstmtrifork commented Nov 14, 2018

So basically i have two files in which a header called "PersonID" is common.
I read the first file and insert the data into a hashmap (se code below)
Then i read the second file where i use the hashmap to get the another header "CivilRegistrationNumber" based on "PersonID".
There are 14 headers (columns) in the second csv file in which all of them are strings.
I then use all the information to insert into a database
`

public Map<String, String> readingFileAtOnce(File file) throws IOException, InterruptedException {
        Map<String, String> personMap = new HashMap<>();
        CsvReader csvReader = new CsvReader();
        csvReader.setContainsHeader(true);
        csvReader.setTextDelimiter('\'');
        csvReader.setSkipEmptyRows(true);
        CsvParser csvParser = csvReader.parse(file, StandardCharsets.UTF_8);
        CsvRow row;
        boolean headersValidated = false;
        while ((row = csvParser.nextRow()) != null) {
            if (Thread.currentThread().isInterrupted())
                throw new InterruptedException();

            if (! headersValidated) {
                dataValidator.validateHeadersExists(csvParser.getHeader(), Arrays.asList("PersonID", "CivilRegistrationNumber"));
                headersValidated = true;
            }

            try {
                dataValidator.validatePersonData(row.getField("PersonID"), row.getField("CivilRegistrationNumber"));
                personMap.put(row.getField("PersonID"), row.getField("CivilRegistrationNumber"));
            } catch (IllegalStateException e) {
                error++;
                log.error("...");
            } catch (IllegalArgumentException e) {
                error++;
                log.error("...");
            }
        }
        return personMap;
    }

`

@dhoard
Copy link

dhoard commented Nov 14, 2018

This sounds like a JVM tuning issue ... How much heap memory are you allocating to the JVM?

IMO this design doesn't scale well.

You would be better off ...

  1. Sorting both files by PersonID
  2. Read a record from file 1.
  3. Read a record from file 2.
  4. Merge the records and write them to file 3.

@osiegmar
Copy link
Owner

Is the CSV file proper formatted? I know situations where missing (closing) text delimiters are resulting in huge column data.

@osiegmar osiegmar self-assigned this Feb 9, 2019
@osiegmar osiegmar closed this as completed Feb 9, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants