Skip to content

Commit

Permalink
Doc / Update documentation #45
Browse files Browse the repository at this point in the history
  • Loading branch information
jibidus committed Dec 6, 2019
2 parents e8d146e + 0b4d232 commit 04993ef
Show file tree
Hide file tree
Showing 6 changed files with 101 additions and 16 deletions.
1 change: 0 additions & 1 deletion .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -18,7 +18,6 @@ build/
## Documentation cache and generated files:
/.yardoc/
/_yardoc/
/doc/
/rdoc/

## Environment normalization:
Expand Down
23 changes: 15 additions & 8 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -23,17 +23,19 @@ Like all benchmarks, some tuning can produce different results, yet this chart g

- Usual ActiveRecord process (validations, callbacks, computed fields like `created_at`...) is bypassed. This is the price for performance
- Custom enclosing field (ex: `"`) is not supported yet
- Custom line serparator (ex: `\r\n` for windows file) is not supported yet
- Custom line separator (ex: `\r\n` for windows file) is not supported yet
- MySQL: encoding is not supported yet
- MySQL: transaction is not supported yet
- MySQL: row_index is not supported yet

Note about custom line separator: it might work by opening the file with the `universal_newline` argument (e.g. `file = File.new(path, universal_newline: true)`). Unfortunately, we weren't able to reproduce and test it so we don't support it "officialy". You can find more information in [this ticket](https://github.com/sogilis/csv_fast_importer/pull/45#issuecomment-326578839) (in French).

## Installation

Add the dependency to your Gemfile:

```gemfile
gem 'csv_fast_importer`
```ruby
gem 'csv_fast_importer'
```

Run `bundle install`.
Expand Down Expand Up @@ -71,19 +73,18 @@ For instance, a `FIRSTNAME` CSV column will be mapped to the `firstname` field.

| Option key | Purpose | Default value |
| ------------ | ------------- | ------------- |
| *encoding* | File encoding. *PostgreSQL only*| `'UTF-8'` |
| *encoding* | File encoding. *PostgreSQL only* (see [FAQ](doc/faq.md) for more details)| `'UTF-8'` |
| *col_sep* | Column separator in file | `';'` |
| *destination* | Destination table | given base filename (without extension) |
| *mapping* | Column mapping | `{}` |
| *row_index_column* | Column name where inserting file row index (not used when `nil`). *PostgreSQL only* | `nil` |
| *transaction* | Execute DELETE and INSERT in same transaction. *PostgreSQL only* | `:enabled` |
| *deletion* | Row deletion method (`:delete` for SQL DELETE, `:truncate` for SQL TRUNCATE or `:none` for no deletion before import) | `:delete` |

Your CSV file should be encoding in UTF-8 but you can specify another encoding
with the `encoding` option (*PostgreSQL only*).
If your CSV file is not encoded with same table than your database, you can specify encoding at the file opening (see [FAQ](doc/faq.md) for more details):

```ruby
CsvFastImporter.import file, encoding: 'ISO-8859-1'
file = File.new '/path/to/knights.csv', encoding: 'ISO-8859-1'
```

You can specify a different separator column with the `col_sep` option (`;` by
Expand Down Expand Up @@ -117,9 +118,13 @@ Lancelot;lancelot@logre.cel
To map the `KNIGHT_EMAIL` column to the `email` database field:

```ruby
CsvFastImporter.import file, mapping: { email: :knight_email }
CsvFastImporter.import file, mapping: { knight_email: :email }
```

## Need help?

See [FAQ](doc/faq.md).

## How to contribute?

You can fork and submit new pull request (with tests and explanations).
Expand All @@ -136,6 +141,8 @@ $ bundle exec rake test:db:create
```
This will connect to `localhost` PostgreSQL database without user (see `config/database.postgres.yml`) and create a new database dedicated to tests.

*Warning:* database instance have to allow database creation with `UTF-8` encoding.

Finally, you can run all tests with RSpec like this:

```sh
Expand Down
2 changes: 1 addition & 1 deletion Rakefile
Original file line number Diff line number Diff line change
Expand Up @@ -13,7 +13,7 @@ namespace :test do
when :postgres
require 'pg'
db.connect 'postgres'
ActiveRecord::Base.connection.execute "CREATE DATABASE #{db.name}"
ActiveRecord::Base.connection.execute "CREATE DATABASE #{db.name} ENCODING='UTF-8'"

when :mysql
require 'mysql2'
Expand Down
64 changes: 64 additions & 0 deletions doc/faq.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,64 @@
# Frequently Asked Questions

## How to specify encoding?

Multiple components are involved when `CSV Fast Importer` is executed:

- file
- ruby `File` wrapper
- database client (managed by `ActiveRecord` connection)
- SQL command (`COPY` for PostgreSQL)
- database server

Encoding must be consistent accross all these components. Here is how to specify or check each component encoding.

### File

You can get current file encoding with `file -i [file_path]` (`-I` on macOS) command.
Some tools like [iconv](http://www.gnu.org/savannah-checkouts/gnu/libiconv/documentation/libiconv-1.15/iconv.1.html) can modify file encoding.

### Ruby `File` wrapper

`File` uses default Ruby encoding (given by `Encoding.default_external`. See [External / Internal Encoding](https://ruby-doc.org/core-2.4.1/Encoding.html#class-Encoding-label-External+encoding) which might be different from file enoding!

```ruby
File.new 'path/to/file.csv'
```

But, you can specify encoding with `encoding` parameter:

```ruby
File.new 'path/to/file.csv', encoding: 'ISO-8859-1'
```

Ruby `File` can also handle internal and external encoding (see [File::new](https://ruby-doc.org/core-2.4.1/File.html#method-c-new) which can be useful to manage automatic conversion:

```ruby
File.new 'path/to/file.csv', external_encoding: 'ISO-8859-1', internal_encoding: 'UTF-8'
# or
File.new 'path/to/file.csv', encoding: 'ISO-8859-1:UTF-8'
```

### Database client

Database is accessed through a dedicated client.
This client is managed by `ActiveRecord` with some configuration (`database.yml` in Rails application) where `encoding` parameter can be defined.

### SQL Command

By default, `COPY` and `LOAD DATA INFILE` commands follow database client encoding configuration. But you can override this with dedicated parameter.
This is the purpose of `CSV FAST Importer`'s `encoding` parameter.

### Database server

Each Postgres server instance is encoded with a specific table. You can show this with following command:

```shell
psql -l
```

Or, from `psql` client:

```sql
\l
```
25 changes: 20 additions & 5 deletions spec/csv_fast_importer_spec.rb
Original file line number Diff line number Diff line change
Expand Up @@ -22,13 +22,28 @@
end

describe 'with custom file encoding', skip_mysql: true do
before do
file = write_file [ %w(name id), %w(trépassé 10) ], encoding: 'ISO-8859-1'
CsvFastImporter.import file, encoding: 'ISO-8859-1'
let(:file) { write_file [ %w(name id), %w(trépassé 10) ], encoding: 'ISO-8859-1' }

context 'with CSVFastImporter custom encoding' do
before do
CsvFastImporter.import file, encoding: 'ISO-8859-1'
end

it 'must import with correct encoding' do
db.query('SELECT name FROM knights').to_s.should eql 'trépassé'
end
end

it 'must import with correct encoding' do
db.query('SELECT name FROM knights').to_s.should eql 'trépassé'
context 'with File encoding conversion' do
before do
file_with_conversion = File.new file.path, internal_encoding: 'UTF-8',
external_encoding: 'ISO-8859-1'
CsvFastImporter.import file_with_conversion
end

it 'must import with correct encoding' do
db.query('SELECT name FROM knights').to_s.should eql 'trépassé'
end
end
end

Expand Down
2 changes: 1 addition & 1 deletion spec/support/csv_writer.rb
Original file line number Diff line number Diff line change
Expand Up @@ -14,7 +14,7 @@ def create(content, options = {})
csv << line
end
end
File.new new_file
File.new new_file, options
end

def new_temp_folder
Expand Down

0 comments on commit 04993ef

Please sign in to comment.