GC limit overhead exceeded because of temporary objects #29

extstmtrifork · 2018-11-13T20:13:35Z

Hi,

I am trying to read from a csv file containing a bit more than 2 million rows, then make a simple mapping to something i can use to finally insert it to a database. However, i am getting an erro: "GC limit overhead exceeded", as it creates a lot of temporary objects.

I read the other issue regarding temporary objects, however as i could understand, it is regarding writing to a csv file, but i am getting this error while reading from an csv file.

dhoard · 2018-11-13T20:47:39Z

@extstmtrifork can you provide some sample code?

extstmtrifork · 2018-11-14T08:12:03Z

So basically i have two files in which a header called "PersonID" is common.
I read the first file and insert the data into a hashmap (se code below)
Then i read the second file where i use the hashmap to get the another header "CivilRegistrationNumber" based on "PersonID".
There are 14 headers (columns) in the second csv file in which all of them are strings.
I then use all the information to insert into a database
`

public Map<String, String> readingFileAtOnce(File file) throws IOException, InterruptedException {
        Map<String, String> personMap = new HashMap<>();
        CsvReader csvReader = new CsvReader();
        csvReader.setContainsHeader(true);
        csvReader.setTextDelimiter('\'');
        csvReader.setSkipEmptyRows(true);
        CsvParser csvParser = csvReader.parse(file, StandardCharsets.UTF_8);
        CsvRow row;
        boolean headersValidated = false;
        while ((row = csvParser.nextRow()) != null) {
            if (Thread.currentThread().isInterrupted())
                throw new InterruptedException();

            if (! headersValidated) {
                dataValidator.validateHeadersExists(csvParser.getHeader(), Arrays.asList("PersonID", "CivilRegistrationNumber"));
                headersValidated = true;
            }

            try {
                dataValidator.validatePersonData(row.getField("PersonID"), row.getField("CivilRegistrationNumber"));
                personMap.put(row.getField("PersonID"), row.getField("CivilRegistrationNumber"));
            } catch (IllegalStateException e) {
                error++;
                log.error("...");
            } catch (IllegalArgumentException e) {
                error++;
                log.error("...");
            }
        }
        return personMap;
    }

`

dhoard · 2018-11-14T14:30:57Z

This sounds like a JVM tuning issue ... How much heap memory are you allocating to the JVM?

IMO this design doesn't scale well.

You would be better off ...

Sorting both files by PersonID
Read a record from file 1.
Read a record from file 2.
Merge the records and write them to file 3.

osiegmar · 2018-11-15T06:04:47Z

Is the CSV file proper formatted? I know situations where missing (closing) text delimiters are resulting in huge column data.

osiegmar self-assigned this Feb 9, 2019

osiegmar added the invalid label Feb 9, 2019

osiegmar closed this as completed Feb 9, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

GC limit overhead exceeded because of temporary objects #29

GC limit overhead exceeded because of temporary objects #29

extstmtrifork commented Nov 13, 2018 •

edited

dhoard commented Nov 13, 2018

extstmtrifork commented Nov 14, 2018 •

edited

dhoard commented Nov 14, 2018 •

edited

osiegmar commented Nov 15, 2018

GC limit overhead exceeded because of temporary objects #29

GC limit overhead exceeded because of temporary objects #29

Comments

extstmtrifork commented Nov 13, 2018 • edited

dhoard commented Nov 13, 2018

extstmtrifork commented Nov 14, 2018 • edited

dhoard commented Nov 14, 2018 • edited

osiegmar commented Nov 15, 2018

extstmtrifork commented Nov 13, 2018 •

edited

extstmtrifork commented Nov 14, 2018 •

edited

dhoard commented Nov 14, 2018 •

edited